Autoscaling works by specifying a desired target CPU percentage, and a minimum and maximum number of allowed replicas. The CPU percentage is expressed as a percentage of the cpu resource request of the pod. Recall that pods can set resource requests for CPU to ensure they are scheduled on a node with at least that much CPU available. If no CPU request is set auto scaling won’t take any action. Kubernetes will increase or decrease the number of replicas according to the average CPU usage of all the replicas. The autoscaler will increase the number of replicas when the actual CPU usage of the current pods exceeds the target and vice versa for decreasing the number of pods. It will never create more replicas than the maximum you set nor will it decrease the number of replicas below your configured minimum. You can configure some of the parameters of the autoscaler but the defaults will work fine for us. With the defaults the autoscaler will compare the actual cpu usage to the target cpu and either increase the replicas if the actual cpu is sufficiently higher than the target or decrease the replicas if the actual cpu is sufficiently below the target. Otherwise it will keep the status quo.
Autoscaling depends on metrics being collected in the cluster so that the average pod CPU can be computed. Kubernetes integrates with several solutions for collecting metrics. We will use metrics server which is a solution maintained by kubernetes. There are several manifest files on the kubernetes metrics-server Github repository that declare all the required resources. We will need to get metrics server up and running before we can use autoscaling. Once metrics server is running Autoscalers can retrieve them using the Kubernetes metrics API.
The lab instance includes the metrics server manifests in the metrics-server sub directory. It’s outside the scope of this course to discuss all of the resources that comprise metrics server. All we need to do is create them and we can count on metrics being collected in the cluster. To do that we can use our trusty kubectl create command and specify the directory as the file target.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: app-tier
labels:
app: microservices
tier: app
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-tier
targetCPUUtilizationPercentage: 80
$ kubectl get deployments app-tier
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 5/5 5 5 17m
$ kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=80
$ kubectl get deployments app-tier
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 1/1 1 1 17m
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app-tier Deployment/app-tier 30%/80% 1 5 1 17m
This command is the same as:
kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=80
NOTE: In order for the autoscaler to work, your containers must have the resource/requests set.
spec:
containers:
- name: mycontainer
resources:
requests:
cpu: 20m # 20 milliCPU / 0.02 CPU
If the containers in the pod don't have the resources set, then you will see that the metrics aren't able to be gathered for the pod and an <unknown> for the target.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app-tier Deployment/app-tier 20%/80% 1 5 1 154m
data-tier Deployment/data-tier <unknown>/80% 1 5 1 146m
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
#type: Utilization
#averageUtilization: 50
type: AverageValue
averageValue: .100 # utilizing 100m of a CPU.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Create the HPA using 100m for cpus.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
averageValue: .100
type: AverageValue
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
Run a load on the system.
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
You can watch the number of replicas increase
k get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 96m/100m 1 10 1 2m2s
php-apache Deployment/php-apache 502m/100m 1 10 1 2m7s
php-apache Deployment/php-apache 502m/100m 1 10 4 2m22s