Version: main 🚧

Restore a Platform Deployment with an External Database

info

This feature is available from the Platform version v4.8.0

Modify the following with your specific values to replace on the whole page and generate copyable commands:

PLATFORM_DOMAIN

ACCOUNT_ID

AWS_REGION

DB_VPC_ID

DB_SG_ID

CLUSTER_CONTEXT

CURRENT_DB_INSTANCE_ID

NEW_DB_INSTANCE_ID

SNAPSHOT_NAME

This procedure describes how to restore the Kine database from an RDS snapshot to a new instance and update the platform to use the restored database. Use this procedure for disaster recovery, database migration (for example, enabling IAM authentication), or point-in-time recovery.

warning

The platform deployment must be scaled down before switching the data source to prevent split-brain writes to the old and new databases.

Step 1 - Create a snapshot of the current database

Skip this step if you already have a snapshot to restore from.

aws rds create-db-snapshot \
--db-instance-identifier mariadb-ha-platform \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1

Wait for the snapshot to become available:

aws rds wait db-snapshot-available \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1

Step 2 - Restore the snapshot to a new RDS instance

Restore the snapshot to a new instance in the database VPC. Use the same DB subnet group and security group from the original setup. Include --enable-iam-database-authentication if the new instance should use IAM auth.

aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mariadb-ha-platform-restored \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--db-instance-class db.t3.medium \
--db-subnet-group-name ha-platform-db-subnet \
--vpc-security-group-ids sg-xxxxxxxxx \
--no-publicly-accessible \
--enable-iam-database-authentication \
--region us-east-1

Wait for the new instance to become available:

aws rds wait db-instance-available \
--db-instance-identifier mariadb-ha-platform-restored \
--region us-east-1

Note the new endpoint:

aws rds describe-db-instances \
--db-instance-identifier mariadb-ha-platform-restored \
--query 'DBInstances[0].Endpoint.Address' \
--output text \
--region us-east-1

If using IAM authentication, note the DbiResourceId of the new instance and update the RDSIAMAuthKine IAM policy to include it. Without this, platform pods fail with Access denied for user 'kine' errors because the IAM rds-db:connect permission is scoped to a specific RDS instance resource ID.

aws rds describe-db-instances \
--db-instance-identifier mariadb-ha-platform-restored \
--query 'DBInstances[0].DbiResourceId' \
--output text \
--region us-east-1

Add the new resource ID to the policy's Resource array:

Modify the following with your specific values to generate a copyable command:

OLD_DBI_RESOURCE_ID

NEW_DBI_RESOURCE_ID

aws iam create-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/RDSIAMAuthKine \
--set-as-default \
--policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "rds-db:connect",
      "Resource": [
        "arn:aws:rds-db:us-east-1:123456789012:dbuser:db-OLDXXXXXXXXXXXXXXXXXXXXXXXXXX/kine",
        "arn:aws:rds-db:us-east-1:123456789012:dbuser:db-NEWXXXXXXXXXXXXXXXXXXXXXXXXXX/kine"
      ]
    }
  ]
}'

Step 3 - Scale down the platform

Scale the deployment to zero to stop all writes to the old database.

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
scale deployment -n vcluster-platform loft --replicas=0

Wait for all pods to stop:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get pods -n vcluster-platform -l app=loft --watch

Step 4 - Update the values file

Update the dataSource in the values file to point to the new RDS endpoint:

Modify the following with your specific values to generate a copyable command:

NEW_DATABASE_URL

config:
  database:
    dataSource: "mysql://kine@tcp(mariadb-ha-platform-restored.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306)/kine"

Step 5 - Upgrade the platform

Apply the updated values file.

vCluster CLI
Helm

vcluster platform start \
--namespace vcluster-platform \
--kube-context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
--values platform-ha-values.yaml \
--upgrade \
--no-tunnel

Modify the following with your specific values to generate a copyable command:

CHART_VERSION

helm upgrade loft vcluster-platform --install --create-namespace --repository-config='' \
--namespace vcluster-platform \
--repo "https://charts.loft.sh/" \
--version 4.8.0 \
--kube-context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
-f platform-ha-values.yaml \
--server-side=true --force-conflicts

Step 6 - Scale up the platform

The Helm upgrade keeps replicas at zero because it does not override the manual scale-down. Scale the deployment back up.

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
scale deployment -n vcluster-platform loft --replicas=3

Wait for all pods to become ready:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
rollout status deployment/loft -n vcluster-platform

Step 7 - Verify the restore

Confirm the platform is healthy:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get pods -n vcluster-platform -l app=loft

Verify the platform UI is accessible through the domain and that all replicas are running.

Step 1 - Create a snapshot of the current database​

Step 2 - Restore the snapshot to a new RDS instance​

Step 3 - Scale down the platform​

Step 4 - Update the values file​

Step 5 - Upgrade the platform​

Step 6 - Scale up the platform​

Step 7 - Verify the restore​

Step 1 - Create a snapshot of the current database

Step 2 - Restore the snapshot to a new RDS instance

Step 3 - Scale down the platform

Step 4 - Update the values file

Step 5 - Upgrade the platform

Step 6 - Scale up the platform

Step 7 - Verify the restore