Kubernetes Horizontal Pod Autoscaling provides a way to scale up number of Pod replicas in a deployment based on per-pod metric thresholds set by the operator. Most often the per-pod resource metrics (like CPU and memory) are used to scale the pods. The utilization value is computed over all the containers in each Pod.
Starting with v1.20, Kubernetes started offering HPA scaling based on
container resource metrics as an alpha
feature (as of the time of
this writing). This offers another way to scale the target resource
based on container metrics. Pods can have multiple containers, so this
allows the user to choose the metric utilization over a specific
container of all the Pods.
There are implications due to this. Most notably, as mentioned in the official documentation example, if you have a web application and a logging sidecar, you can scale based on the resource utilization of the web application, ignoring the sidecar container. This could be extended to other applications with multiple sidecars. As I describe next, there are certain constraints under which this method of scaling should be used.
Let’s say we have a Pod with two containers A and B, and we want to scale on resource utilization of container A.
Let \(U_{a}\) and \(U_{b}\) be the resource utilizations of containers A and B. Let \(Q_{a}\) and \(Q_{b}\) be the metric request setting of the two containers defined in the Pod Spec.
In the per-pod resource metric based scaling, the HPA will scale based on the utilization using the following formula:
Per-pod based usage:
\begin{equation} P_{u} = \frac{U_{a} + U_{b}}{Q_{a} + Q_{b}} \end{equation}
In the container resource metric based scaling, the HPA will scale based on the utilization of container A, using the following formula:
Container based usage:
\begin{equation} C_{u} = \frac{U_{a}}{Q_{a}} \end{equation}
For the same utilizations and request
settings in both scenarios,
the HPA will scale Pods earlier while using container resource
metrics compared to per-pod resource metrics; only if the resource
utilization of container B is less than container A, i.e.,
if
\begin{equation} \frac{U_{b}}{Q_{b}} < \frac{U_{a}}{Q_{a}} \end{equation}
By substituting \( U_{b} < \frac{U_{a}Q_{b}}{Q_{a}} \) into \( (1) \) and simplifying, we get:
\begin{equation} \frac{U_{a} + U_{b}}{Q_{a} + Q_{b}} < \frac{U_{a}}{Q_{a}} \end{equation}
The resource utilization of container A will meet the threshold earlier using container based usage and so HPA scales this earlier - only if the resource utilization of the other containers are less compared to the container (A) we are interested in. So make sure that the other container resource usages are well below the container over which you are scaling, otherwise scaling may be delayed affecting performance.