Backup and Disaster Recovery for Managed Kubernetes

Backup and Disaster Recovery for Managed Kubernetes

Whether a problem occurs during a Kubernetes upgrade, human error, or a system failure, you want to be prepared before your workloads are affected.

During the design phase of your Kubernetes cluster, it is important to implement a good backup and restore strategy and test it in a realistic scenario.

Below are some tips on what to cover when putting your disaster recovery strategy in place.

Leaseweb recommendations for Managed Kubernetes backup

Minimally, as part of your backup strategy, you must make sure that you have:

  • a copy of the state of your Kubernetes cluster (API definitions for all objects and namespaces in your cluster)
  • a copy of the data stored on PersistentVolumes
  • a copy of the secrets and certificates stored on the cluster
  • Any state that your cluster depends on (see more below)

Tools to backup your cluster

The Kubernetes ecosystem is vast, and many tools exist to reach your goals.

The following tools should be compatible with Leaseweb Managed Kubernetes and contribute to putting in place a successful backup strategy for your Kubernetes cluster:

  • Velero: Backup and migrate Kubernetes applications and their persistent volumes.
  • Kasten K10: simplifies the operational management of stateful Cloud-Native applications for Kubernetes
  • CloudCasa: Backup as a Service for Kubernetes and Cloud-Native applications

The CNCF foundation lists backup and disaster recovery solutions as part of the CNCF landscape tool.

Backup best practices

A successful backup and restore strategy should meet the following criteria:

  1. Make sure an up-to-date copy of all the stateful parts of the applications running on your cluster is stored and available in case of disaster:
  • databases (e.g. MariaDB, PostgreSQL)
  • secrets & certificates in an encrypted fashion
  • configuration of your applications
  • key-value store, messaging services used in your cluster (e.g. etcd, Redis, RabbitMQ)
  • any external storage that your cluster relies on (e.g. Object storage, Cloud storage, S3 bucket)
  1. Uses best practices when managing your cluster, such as:
  • Using Infrastructure as Code (IaC) to define your applications
  • Having live replicas of your cluster for disaster recovery purposes
  • Test your restore procedure frequently, and fix any issues that arise during those tests
  • Chaos engineering
  1. It is also important to set targets for each of those criteria, and consider their impact as part of the strategy:
  • Recovery Point Objective (RPO): the maximum amount of data loss that is acceptable during a disaster or system outage
  • Recovery Time Objective (RTO): is the targeted duration within which a system or service must be restored after a disruption or disaster
  • backup frequency
  • geographic redundancy
  • retention period
  • Total Cost of Ownership (TCO)
  • Security of your backup

All the above choices should guide you to the best backup strategy for your Kubernetes cluster.

Backup and Disaster Recovery for Managed Kubernetes - Manage

 

We also recommend the below articles for further reading:

Topic
Associated Documentation
How to create a 3-2-1 backup system
How to backup an etcd key-value store
Definition and list of tools about Chaos Engineering
CNCF List of backup tools