Skip to main content
Version: main 🚧

Restore a Platform Deployment with an External Database

info
This feature is available from the Platform version v4.8.0
Modify the following with your specific values to replace on the whole page and generate copyable commands:

This procedure describes how to restore the Kine database from an RDS snapshot to a new instance and update the platform to use the restored database. Use this procedure for disaster recovery, database migration (for example, enabling IAM authentication), or point-in-time recovery.

warning

The platform deployment must be scaled down before switching the data source to prevent split-brain writes to the old and new databases.

Step 1 - Create a snapshot of the current database​

Skip this step if you already have a snapshot to restore from.

aws rds create-db-snapshot \
--db-instance-identifier mariadb-ha-platform \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1

Wait for the snapshot to become available:

aws rds wait db-snapshot-available \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1

Step 2 - Restore the snapshot to a new RDS instance​

Restore the snapshot to a new instance in the database VPC. Use the same DB subnet group and security group from the original setup. Include --enable-iam-database-authentication if the new instance should use IAM auth.

aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mariadb-ha-platform-restored \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--db-instance-class db.t3.medium \
--db-subnet-group-name ha-platform-db-subnet \
--vpc-security-group-ids sg-xxxxxxxxx \
--no-publicly-accessible \
--enable-iam-database-authentication \
--region us-east-1

Wait for the new instance to become available:

aws rds wait db-instance-available \
--db-instance-identifier mariadb-ha-platform-restored \
--region us-east-1

Note the new endpoint:

aws rds describe-db-instances \
--db-instance-identifier mariadb-ha-platform-restored \
--query 'DBInstances[0].Endpoint.Address' \
--output text \
--region us-east-1

If using IAM authentication, note the DbiResourceId of the new instance and update the RDSIAMAuthKine IAM policy to include it. Without this, platform pods fail with Access denied for user 'kine' errors because the IAM rds-db:connect permission is scoped to a specific RDS instance resource ID.

aws rds describe-db-instances \
--db-instance-identifier mariadb-ha-platform-restored \
--query 'DBInstances[0].DbiResourceId' \
--output text \
--region us-east-1

Add the new resource ID to the policy's Resource array:

Modify the following with your specific values to generate a copyable command:
aws iam create-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/RDSIAMAuthKine \
--set-as-default \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "rds-db:connect",
"Resource": [
"arn:aws:rds-db:us-east-1:123456789012:dbuser:db-OLDXXXXXXXXXXXXXXXXXXXXXXXXXX/kine",
"arn:aws:rds-db:us-east-1:123456789012:dbuser:db-NEWXXXXXXXXXXXXXXXXXXXXXXXXXX/kine"
]
}
]
}'

Step 3 - Scale down the platform​

Scale the deployment to zero to stop all writes to the old database.

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
scale deployment -n vcluster-platform loft --replicas=0

Wait for all pods to stop:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get pods -n vcluster-platform -l app=loft --watch

Step 4 - Update the values file​

Update the dataSource in the values file to point to the new RDS endpoint:

Modify the following with your specific values to generate a copyable command:
config:
database:
dataSource: "mysql://kine@tcp(mariadb-ha-platform-restored.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306)/kine"

Step 5 - Upgrade the platform​

Apply the updated values file.

vcluster platform start \
--namespace vcluster-platform \
--kube-context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
--values platform-ha-values.yaml \
--upgrade \
--no-tunnel

Step 6 - Scale up the platform​

The Helm upgrade keeps replicas at zero because it does not override the manual scale-down. Scale the deployment back up.

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
scale deployment -n vcluster-platform loft --replicas=3

Wait for all pods to become ready:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
rollout status deployment/loft -n vcluster-platform

Step 7 - Verify the restore​

Confirm the platform is healthy:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get pods -n vcluster-platform -l app=loft

Verify the platform UI is accessible through the domain and that all replicas are running.