Best practices for scaling Kubernetes Clusters

Best practices for scaling Kubernetes Clusters

Best Practices for Running Cluster Autoscaling

To ensure optimal performance when using Autoscaling, consider the following best practices:

  • Specify resource requests for your pods to ensure that Autoscaling decisions are based on accurate resource requirements
  • Use Pod Disruption Budgets (PDBs) to prevent pods from being terminated too abruptly during disruptions, especially if high availability is critical
  • Define Node Limits Appropriately: configure your node pools carefully, ensuring the maximum number of nodes does not exceed 10. If you require more than 10 nodes, please contact us
  • Avoid running additional autoscaling on your clusters as they may conflict with the Cluster Autoscaling operations

Pod Disruption Budgets (PDB) configuration

A well-configured Pod Disruption Budget (PDB) is crucial for maintaining application availability and ensuring smooth operations during scaling, upgrades, or maintenance activities. Improperly set PDBs can lead to several issues that impact both application performance and autoscaling efficiency. 

Common PDB Issues and Their Impact 

  1. Overly Strict PDBs 

Configuring a PDB with excessive restrictions can prevent Kubernetes from disrupting pods during critical operations. For example, if a deployment has 4 replicas and the PDB is set with minAvailable: 4, Kubernetes cannot disrupt any pod during activities like upgrades or resizes. This strict configuration blocks essential processes such as autoscaling, node maintenance, or cluster upgrades, leading to delays and inefficient resource usage. 

  1. Overly Lenient PDBs 

A PDB with insufficient restrictions, such as minAvailable: 1 for deployment with 4 replicas, allows up to 3 pods to be disrupted simultaneously. While this configuration enables flexibility for scaling and maintenance, it risks application instability or degraded performance if the remaining pod cannot handle the workload. 

  1. PDB Conflicts with Autoscaling 

Incorrect PDB settings can interfere with cluster autoscaling policies, preventing nodes from being added or removed effectively. For instance, if a StatefulSet is configured with replicas: 1 and minAvailable: 1, Kubernetes cannot schedule the pod elsewhere during node draining, effectively blocking the process and preventing cluster scaling.

Best Practices for configuring PDBs

  1. Balance Availability and Flexibility 

Choose a balanced configuration that aligns with your application’s tolerance for disruption. For example: 

  • For a deployment with 4 replicas, setting minAvailable: 3 ensures at least 3 pods remain running during disruptive actions. This strikes a balance between maintaining application stability and allowing autoscaling and operational flexibility. 
  1. Test PDB Configurations 

Simulate disruptions and observe how the PDB affects pod behaviour and application performance. Testing helps identify configurations that might block critical operations or degrade performance. 

  1. Consider StatefulSets 

Stateful workloads, which often require pod uniqueness and ordered deployment, are particularly sensitive to PDB misconfigurations. Ensure PDBs are carefully tailored to the operational needs of StatefulSets to avoid blocking node maintenance or autoscaling. 

  1. Align PDBs with Autoscaling Needs 

When using cluster autoscaling, ensure that PDBs do not restrict pod disruptions to the point where the autoscaling cannot function efficiently. For example, avoid setting minAvailable equal to the total number of replicas unless absolutely necessary for high availability. 

Cluster Autoscaling and Horizontal Pod Autoscaling Working Together

Cluster Autoscaling and Horizontal Pod Autoscaling (HPA) are complementary features that work together to optimize resource usage and application performance.

Cluster Autoscaling dynamically adjusts the number of nodes in your Kubernetes cluster based on the resource demands of your workloads, scaling the cluster up or down as necessary.

Horizontal Pod Autoscaling automatically adjusts the number of pod replicas in a deployment based on observed CPU or memory usage, ensuring that applications can handle varying loads efficiently.

When used together, Cluster Autoscaling can add nodes to the cluster when HPA scales up pods beyond the capacity of existing nodes, while HPA ensures that the workloads running on those nodes are efficiently scaled based on demand. This combination allows your cluster to flexibly respond to both changes in pod resource requirements and node capacity, maintaining optimal performance while minimizing resource wastage.

Pods that can prevent cluster Autoscaling from removing a node

Certain types of pods can prevent the Cluster Autoscaling from removing a node. For more detailed information on these pods and their behaviour, please refer here.

What kind of consequence can arise from scaling down a Kubernetes Cluster?

Scaling down a Kubernetes cluster will reduce a cluster’s capacity in terms of memory and cpu, and might stop pods in the cluster, ultimately causing problems in the application it serves.
The scaling process does not account for the cluster capacity / current load to scale, so extra attention is needed.

To scale down a cluster, Kubernetes will choose a Node to remove, drain the pods from it, and delete the node.

If an application lives only on the node being drained, and this application makes use of:

  • Local volume (using hostPath or equivalent)
  • nodeSelector
  • node affinity
  • PodDisruptionBudget

Kubernetes might have difficulty draining the node and keeping the pods live, and Kubernetes will hang during the draining process.

To prevent issues on pods during the scaling down of a cluster, the following strategies can be used:

  • Make sure a Pod has multiple replicas on multiple nodes using topology or affinity
  • Use explicit PodDisruptionBudget on the deployments/pods.