Kubernetes Autoscaling using HPA, VPA & Cluster Autoscaler

Last Updated on February 18, 2023 by cscontents

Table of Contents

Introduction

Kubernetes is a container orchestration tool; it is one of the most powerful container orchestration tool. Kubernetes gives us lot of benefits, but one of the most important benefit is scaling. For a high level understanding of Kubernetes please head over to below article.

Kubernetes Series: Part 1 – Introduction to Kubernetes | Background of Kubernetes

In this article we will be discussing scaling in Kubernetes cluster, what are the various types and how we can achieve it.

What is Scaling?

Basically, scaling means increasing or decreasing the resources. In the context of compute instance scaling means increasing or decreasing the CPU & RAM.

When we increase the resource it is called up-scaling.
When we decrease the resource it is called down-scaling.

Vertical scaling vs Horizontal Scaling – in Hardware scaling

To explain the difference between these two types of scaling I will take an example.

For example, you have a computer, and currently you have 8 GB RAM. Now you want to increase the RAM up to 16 GB. There are two ways to do so –

Vertical scaling – It means adding up resources in the same computer. So, here we will not increase the number of computer rather we would increase the RAM in the same computer.
- Vertically can be up-scaled.
- Vertically can be down-scaled.
Horizontal scaling – It means adding up a new computer. So, here you won’t increase the RAM in same computer rather you would bring up a new computer with 8 GB RAM which would make total of 16 GB RAM.
- Horizontally can be up-scaled.
- Horizontally can be down-scaled.

Depending on the requirement vertical scaling & Horizontal scaling are used.

Two ways of achieving scaling in Kubernetes cluster

In Kubernetes cluster we can achieve scaling in two ways –

Manual way – this can be used for learning purpose or in test environment.
Automatic way – this is also called autoscaling, it means dynamically scaling up /down the resources. This automatic way is the real use of Kubernetes engine. Using automated scaling we will be able to leverage the real power of Kubernetes.

Various types of Automatic Scaling or Autoscaling in Kubernetes cluster

In Kubernetes, we have two type of autoscaling –

Pod autoscaling – it is nothing but automatic scaling of pod. And this could be two types.
1. HPA (Horizontal Pod Autoscaling) – In this case scaling is done by increasing the number of pods, which in turn increases the resources.
2. VPA (Vertical Pod Autoscaling) – In this case scaling is done by increasing the resource allocation to the containers in the same pod. That is, in this case number of pod won’t increase.
Cluster autoscaling or node autoscaling – it is nothing but scaling the Kubernetes cluster by scaling the number of nodes in the cluster, it is also called node autoscaling.

Note:

All the above types of autoscaling are based on resource usage metric sets, so Kubernetes should be able to get those metric sets. Most cloud provider installs the metric-server for us under kube-system namespace, that means we don’t need to install it on the concerned cloud platform (for example, EKS on AWS and AKS on Azure).

But if you are running native Kubernetes cluster, and you want to install metric-server, then you can follow the below GitHub repo.

https://github.com/kubernetes-sigs/metrics-server

If metric-server is already there in Kubernetes cluster, or you have installed it, in both the case you can execute the below command to check resource utilization details by the Kubernetes cluster node, pods etc.

kubectl top nodes
kubectl top pods

HPA or Horizontal Pod Autoscaling

In this case number of pods increase or decreases automatically based on the utilization of resource.

In the context of HPA we need to understand below two concept/ terminology/ attribute which are used in the spec section of pod in the manifest file (YAML file).

Resource request – it specifies the minimum amount of resource which are guaranteed to be allocated to that container which is specified in the pod definition file or pod manifest file (YAML). Based on this minimum amount resource Kubernetes will deploy the pod in that node where it can get the resource.

Resource limit – it specifies the amount of maximum resource which can be allocated to that container which is specified in the pod definition file or pod manifest file (YAML). This is basically used to restrict the resource utilization by container/pod to that limit.

Basically, Resource requests and Resource limits are the ways by which Kubernetes can control the allocation of resources (CPU and memory) to the containers inside pods.

Example Deployment manifest file (YAML)

For example, we will take one sample deployment object. We will see resource limit & resource request in spec section of pod.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deploy
  labels:
    app: sample-app
spec:
  selector:
    matchLabels:
      app: sample-app
  replicas: 2
  template:
    metadata:
      name: sample-app
      labels:
        app: sample-app
    spec:
      containers:
        - name: sample-app-container
          image: sample-app:v1
          resources:
            requests:
              cpu: 500m  ## 500 milicore
              memory: 256Mi ## 256 Mebibyte
            limits:
              cpu: 1000m
              memory: 512Mi

So, by seeing above example manifest file of deployment object it is clear to us how we can specify resource requests and limits.

How does HPA work?

To implement HPA in Kubernetes cluster we need to create/deploy an HPA object inside Kubernetes cluster which will the monitor situation and based on the mentioned condition in HPA manifest file (YAML), horizontal pod autoscaling will be triggered.

Example HPA manifest file (YAML)

We will write an HPA manifest file for the above deployment object (see the above YAML file).

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: sample-app-scaler
spec:
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-deploy ## this name should match with the deployment name

The above example HPA definition file is for the deployment which is mentioned previously. Here we could see below 3 important parameters –

minReplicas – it specifies the minimum number of pod which will be running.
maxReplicas – it specifies max number of pod which can be run after scaling.
targetCPUUtilizationPercentage – it specifies the target CPU utilization percentage. This is basically a condition. Here it means if deployed pods hit the 50% usage of resource request CPU amount, then HPA will trigger new pod deployment. And this way it can keep on deploying new pod until there is total 5 pod which is maxReplicas. Once total pod number is reached to maxReplicas then it won’t deploy any other pod.

Here by using HPA Kubernetes always try to keep resource utilization by container in any pod up to 50% of resource request amount.

When the load is less Kubernetes will slowly down-scale or decrease the number of pod. And it will always maintain 2 replicas of pod as mentioned in the above HPA manifest file (YAML).

VPA or Vertical Pod Autoscaling

Vertical Pod Autoscaling basically means increasing/ decreasing the resource allocation in the pod automatically instead of deploying new pod. Using VPA object Kubernetes does this.

To install VPA in your Kubernetes cluster you can follow below GitHub repo.

https://github.com/kubernetes/autoscaler.git

https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

To explain this, we will take one example deployment.

Example Deployment manifest file (YAML)

For example, we will take one sample deployment object. We will see resource limit & resource request in spec section of pod.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deploy
  labels:
    app: sample-app
spec:
  selector:
    matchLabels:
      app: sample-app
  replicas: 2
  template:
    metadata:
      name: sample-app
      labels:
        app: sample-app
    spec:
      containers:
        - name: sample-app-container
          image: sample-app:v1
          resources:
            requests:
              cpu: 500m  ## 500 milicore
              memory: 256Mi ## 256 Mebibyte
            limits:
              cpu: 1000m
              memory: 512Mi

Example Vertical Pod Autoscaling (VPA) manifest file

In the above we have mentioned one example deployment manifest file. Now, for this deployment we will create a VPA object. Below is the manifest file for VPA (YAML file) –

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: sample-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       sample-deploy ## this name should match with the deployment name
  updatePolicy:
    updateMode: "Auto"

Note: we need to maintain the compatibility mentioned in the below link.

https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#compatibility

In the above VPA manifest file, we could see a parameter “updatePolicy” under spec. Under “updatePolicy” we have “updateMode”. This “updateMode” has two values –

updateMode: “Auto”
updateMode: “Off”

When we use “Auto”

VPA will check the real-time resource utilization by the pods and if it sees that resource utilization for any pod(s) is more than what is mentioned in the resource request value (mentioned in the deployment manifest file), then it will redeploy/re-create the pod automatically with updated resource requests & limits value.

Here, VPA will only update the resource requests & limits values for a particular pod, and it will not update the resource requests & limits values in the actual deployment.

VPA will take the above action only if the number of replicas is more than 1 in that deployment. If it is 1 then VPA won’t take up the above action since it can’t delete this only pod. The VPA action involve deleting the existing pod and recreating it.

When we use “Off”

VPA will check the real-time resource utilization by the pods and if it sees that resource utilization for any pod(s) is more than what is mentioned in the resource request value (mentioned in the deployment manifest file), then it will recommend the resource requests & limits value for us to apply the changes (since we have ‘updateMode: Off’).

By seeing these recommendations by VPA we can redeploy the pod with new resource request & limits value.

Cluster Autoscaler

Cluster Autoscaler scales our Kubernetes cluster automatically as per requirement. If there is any pending pod which could not be deployed due to resource unavailability in the nodes then Cluster Autoscaler will up-scale the cluster by adding required number of nodes in the cluster.

Also, if demands goes down and number of pods goes down, in that case Cluster Autoscaler will remove

Here, also we need to use the concept of resource request & resource limit. Based on these two parameters if any pod could not be deployed (or in pending state), in that case Cluster Autoscaler will add new node (s).

To get to this stage, at first we need to deploy Cluster Autoscaler in our Kubernetes cluster. This Cluster Autoscaler is part of Kubernetes community, and it can be found in the below GitHub repo.

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

Here we need to remember that this Cluster Autoscaler is different for different cloud platform. For example, EKS in AWS and AKS in Azure has different Cluster Autoscaler, so we need to deploy right Cluster Autoscaler. Functionality is same in all the Cluster Autoscaler.

Combination of Cluster Autoscaler and Horizontal Pod Autoscaler (HPA) works very well. Since HPA scales pods based on the resource request & limits value and while scaling by HPA if there is resource outage in Kubernetes cluster then Cluster Autoscaler will be trigger and it will add extra node(s) in the cluster.

Thank You.

If you are interested in learning DevOps, please have a look at the below articles, which will help you greatly.