Backup & Disaster Recovery for Managed Kubernetes

Backup & Disaster Recovery for Managed Kubernetes

Whether a problem happens during a Kubernetes upgrade, human error, or a system failure, you want to be prepared before something happens to your workloads.

During the design phase of your Kubernetes cluster, it is important to put in place a good backup and restore strategy, and to test your backup strategy in a realistic scenario.

Check out below some tips on what to cover when putting in place your disaster recovery strategy.

Leaseweb recommendations for Managed Kubernetes backup

Minimally, as part of your backup strategy, you must make sure that you have:

  • a copy of the state of your Kubernetes cluster (API definitions for all objects & namespaces in your cluster)
  • a copy of the data stored on PersistentVolumes
  • a copy of the secrets and certificates stored on the cluster
  • Any state that your cluster depends on (see more below)

In the event of a disaster, Leaseweb will not help restore applications or the data stored on your cluster.

Tools to backup your cluster

The Kubernetes ecosystem is vast, and many tools exists to reach your goals.

The following tools should be compatible with Leaseweb Managed Kubernetes and contribute to put in place a successful backup strategy for your Kubernetes cluster:

  •  Velero: Backup and migrate Kubernetes applications and their persistent volumes.
  •  Kasten K10: simplifies the operational management of stateful Cloud-Native applications for Kubernetes.
  •  CloudCasa: Backup as a Service for Kubernetes and Cloud-Native applications.

The CNCF foundation lists backup and disaster recovery solution as part of the CNCF landscape tool:
https://landscape.cncf.io/?group=projects-and-products&view-mode=grid&category=Runtime&org-type=for_profit&org-type=non_profit

Backup best practices

A good backup and restore strategy should also cover the following aspects of your cluster – see more above for the Leaseweb recommendations for Managed Kubernetes backup.

The stateful parts of the applications running on your cluster:

  • databases (e.g. MariaDB, PostgreSQL)
  • secrets & certificates in an encrypted fashion
  • configuration of your applications
  • key-value store, messaging services used in your cluster (e.g. etcd, Redis, RabbitMQ)
  • any external storage that your cluster relies on (e.g. Object storage, Cloud storage, S3 bucket)

You should consider the following aspects when establishing your strategy:

  •  Recovery Point Objective (RPO): the maximum amount of data loss that is acceptable during a disaster or system outage
  •  Recovery Time Objective (RTO): is the targeted duration within which a system or service must be restored after a disruption or disaster
  •  backup frequency
  •  geographic redundancy
  •  retention period
  •  Total Cost of Ownership (TCO)
  •  Security of your backup

While reviewing the practices around managing your cluster, consider the following best practices:

  •  Using Infrastructure as Code (IaC) to define your applications
  •  Having live replicas of your cluster for disaster recovery purposes
  •  Test your restore procedure frequently, and fix any issues that arise during those tests
  •  Chaos engineering

All these choices should guide you to the best backup strategy for your Kubernetes cluster.

Backup & Disaster Recovery for Managed Kubernetes - Manage