Backup and Disaster Recovery for Managed Kubernetes

This product is now available in the Netherlands, Canada, Germany, the United Kingdom, the United States and Singapore. Click here to stay informed when it goes live in our other entities.

Whether a problem occurs during a Kubernetes upgrade, human error, or a system failure, you want to be prepared before your workloads are affected.

During the design phase of your Kubernetes cluster, it is important to implement a good backup and restore strategy and test it in a realistic scenario.

Below are some tips on what to cover when putting your disaster recovery strategy in place.

Leaseweb recommendations for Managed Kubernetes backup

Minimally, as part of your backup strategy, you must make sure that you have:

a copy of the state of your Kubernetes cluster (API definitions for all objects and namespaces in your cluster)
a copy of the data stored on PersistentVolumes
a copy of the secrets and certificates stored on the cluster
Any state that your cluster depends on (see more below)

Information

In the event of a disaster, Leaseweb will not restore applications or the data stored on your cluster.

Tools to backup your cluster

The Kubernetes ecosystem is vast, and many tools exist to reach your goals.

The following tools should be compatible with Leaseweb Managed Kubernetes and contribute to putting in place a successful backup strategy for your Kubernetes cluster:

Velero: Backup and migrate Kubernetes applications and their persistent volumes.
- See the article Using Velero for backup and restoration of your Managed Kubernetes“
Kasten K10: simplifies the operational management of stateful Cloud-Native applications for Kubernetes
CloudCasa: Backup as a Service for Kubernetes and Cloud-Native applications

The CNCF foundation lists backup and disaster recovery solutions as part of the CNCF landscape tool.

Backup best practices

A successful backup and restore strategy should meet the following criteria:

Make sure an up-to-date copy of all the stateful parts of the applications running on your cluster is stored and available in case of disaster:

databases (e.g. MariaDB, PostgreSQL)
secrets & certificates in an encrypted fashion
configuration of your applications
key-value store, messaging services used in your cluster (e.g. etcd, Redis, RabbitMQ)
any external storage that your cluster relies on (e.g. Object storage, Cloud storage, S3 bucket)

Uses best practices when managing your cluster, such as:

Using Infrastructure as Code (IaC) to define your applications
Having live replicas of your cluster for disaster recovery purposes
Test your restore procedure frequently, and fix any issues that arise during those tests
Chaos engineering

It is also important to set targets for each of those criteria, and consider their impact as part of the strategy:

Recovery Point Objective (RPO): the maximum amount of data loss that is acceptable during a disaster or system outage
Recovery Time Objective (RTO): is the targeted duration within which a system or service must be restored after a disruption or disaster
backup frequency
geographic redundancy
retention period
Total Cost of Ownership (TCO)
Security of your backup

All the above choices should guide you to the best backup strategy for your Kubernetes cluster.