CPU-Based Pod Auto-Scaling in Kubernetes
This article demonstrates and explains how different types of autoscaling work and goes in-depth with horizontal autoscaling with visualizations.
Join the DZone community and get the full member experience.
Join For FreeThe auto-scaling allows us to optimally allocate the resource to an application based its current resource consumption.
Kubernetes offers three main types of autoscaling:
- Horizontal Pod Autoscaler (HPA): HPA controls the number of pods
- Vertical Pod Autoscaler (VPA): VPA controls the resources in individual pods
- Cluster Autoscaler (CA): CA controls the number of nodes in a cluster
Horizontal Pod Autoscaler (HPA)
--horizontal-pod-auto-scaler-sync-period
flag) and if the current threshold is higher than the specified threshold HPA attempts to increase the number of pods. HPA controller assumes a linear relationship between the metric and the number of pods. It operates on the ratio between the desired metric value and the current metric value. The formula used to compute the desired replicas is as follows (refer to the K8s documentation for more details).
In this article, we will focus on horizontal pod autoscaling and we will scale based on the CPU utilization (which is one of the most commonly used metrics). Note that higher CPU utilization indicates higher latency. Therefore, maintaining the CPU utilization at lower levels allows us to maintain the latency (of the application) at lower levels as well. The following figure shows the variation of the CPU utilization of an (I/O bound) microservice.
You may also enjoy: Microservices Architecture: Introduction to Auto Scaling
Deploying the App
Let us now deploy a microservice in Kubernetes and study the performance behaviors with auto-scaling enabled. We will deploy a Spring Boot microservice (see here for the Github repo) in K8s. The following is the Kubernetes YAML file for the deployment.
xxxxxxxxxx
#deploymet/app/app.yaml
apiVersion apps/v1
kind Deployment
metadata
name springboot-app
labels
app springboot-app
spec
replicas1
selector
matchLabels
app springboot-app
template
metadata
labels
app springboot-app
spec
containers
name springboot-app
image anushiya/app latest
resources
limits
cpu"100m"
requests
cpu"100m"
ports
containerPort9000
Configuring the Horizontal Pod Autoscaler
Let us now enable Horizontal Pod Autoscaling for the deployment created above. We configure the HPA to scale based on CPU utilization. The YAML file is shown below.
xxxxxxxxxx
#springboot-app-hpa.yaml
apiVersion autoscaling/v2beta2
kind HorizontalPodAutoscaler
metadata
name springboot-app-hpa
spec
scaleTargetRef
apiVersion apps/v2beta2
kind Deployment
name springboot-app
minReplicas1
maxReplicas20
metrics
resource
name cpu
targetAverageUtilization50
type Resource
HPA is also an API resource in the Kubernetes, with apiVersion, kind, metadata and spec fields (refer to K8s documentation for more details). The scaling condition is defined by resource: targetAverageUtilization
. Here we specify a value of 50. This means that if the CPU utilization exceeds the given value, the scaling process starts. The value should be in between 1 and 100.
Deploying JMeter
To test the performance of the application, we use JMeter as the load testing client. To deploy the JMeter we created a Docker image. The following is the Dockerfile for the JMeter. The files used can be found in this repo.
xxxxxxxxxx
FROM anushiya/jmeter-plugins:v1
ADD bash /home/kubernetes-performance/bash
ADD jar /home/kubernetes-performance/jar
ADD jmx /home/kubernetes-performance/jmx
ADD python /home/kubernetes-performance/python
WORKDIR /home/kubernetes-performance/bash
RUN chmod +x start_performance_test.sh
RUN apt-get update && apt-get install python3.5 -y
RUN apt-get install python-pip -y
RUN pip install numpy requests schedule
Since we want to store the performance test results permanently we use host volume to store the results of the tests performed. To create a host volume ssh into any of the nodes and create a directory to mount.
xxxxxxxxxx
#Get the list of nodes
kubectl get node
#Select a node and ssh into it
sudo gcloud beta compute --project "[name of the project]" ssh --zone "[zone]" "[name of the node]"
#example
sudo gcloud beta compute --project "performance-testing" ssh --zone "us-central1-a" "gke-performance-testing-default-pool-b6e4d476-78zn"
#Create a directory to mount
sudo mkdir /mnt/data/results
Create a persistent volume.
xxxxxxxxxx
#pv-volume.yaml
apiVersion v1
kind PersistentVolume
metadata
name pv-volume
labels
type local
spec
storageClassName manual
capacity
storage 10Gi
accessModes
ReadWriteOnce
hostPath
path"/mnt/data/results"
Create a persistent volume claim.
xxxxxxxxxx
#deployment/volume/pv-claim.yaml
apiVersion v1
kind PersistentVolumeClaim
metadata
name pv-claim
spec
storageClassName manual
accessModes
ReadWriteOnce
resources
requests
storage 6Gi
Apply the YAML files to create persistent volume and persistent volume claim
xxxxxxxxxx
#create persistent volume
kubectl apply -f deployment/volume/pv-volume.yaml
#create persistent volume claim
kubectl apply -f deployment/volume/pv-claim.yaml
For more details about PersistentVolume and PersistenetVolumeClaim see this. Now that we have created volumes to store the test results, we'll move on to creating a Job to perform the tests. The test results can be found in the directory specified above.
#perf-test.yaml
apiVersion batch/v1
kind Job
metadata
name perf-test
spec
template
spec
containers
name perf-test
image anushiya/perf-test v1
imagePullPolicy Always
command"bash" "start_performance_test.sh"
volumeMounts
mountPath"/home/kubernetes-performance/results"
name pv-storage
restartPolicy Never
volumes
name pv-storage
persistentVolumeClaim
claimName pv-claim
backoffLimit4
Analyzing the Behavior of CPU Utilization, Latency and Pod Count
Let us now take a look at how the CPU, pod count, and latency vary with time. The following figures show the variation in CPU utilization, pod count and the latency when we test the performance using a single concurrency user. We have used Stackdriver Monitoring API to get the performance statistics (see this link for more details).
Further Reading
Vertical Scaling and Horizontal Scaling in AWS
Opinions expressed by DZone contributors are their own.
Comments