Restore the Kine Database from a Snapshot
This runbook covers restoring the shared Kine database from an RDS snapshot to a new instance and updating both platform regions to use it. Use this procedure for disaster recovery, database migration (for example, enabling IAM authentication), or point-in-time recovery.
Both platform regions must be scaled down before switching the data source to prevent split-brain writes to the old and new databases.
Configure your values​
This runbook references AWS resource IDs, cluster context ARNs, and file names specific to your deployment. Set them below once and all commands update automatically.
Expand to set page variables
Step 1 - Create a snapshot of the current database​
Skip this step if you already have a snapshot to restore from.
aws rds create-db-snapshot \
--db-instance-identifier mariadb-multi-region \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1
Wait for the snapshot to become available:
aws rds wait db-snapshot-available \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--region us-east-1
Step 2 - Restore the snapshot to a new RDS instance​
Restore the snapshot to a new instance in the database VPC. Use the same DB
subnet group and security group from the original setup. Include
--enable-iam-database-authentication if the new instance should use IAM auth.
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mariadb-multi-region-restored \
--db-snapshot-identifier kine-backup-YYYY-MM-DD \
--db-instance-class db.t3.medium \
--db-subnet-group-name multi-region-db-subnet \
--vpc-security-group-ids sg-xxxxxxxxx \
--no-publicly-accessible \
--enable-iam-database-authentication \
--region us-east-1
Wait for the new instance to become available:
aws rds wait db-instance-available \
--db-instance-identifier mariadb-multi-region-restored \
--region us-east-1
Note the new endpoint:
aws rds describe-db-instances \
--db-instance-identifier mariadb-multi-region-restored \
--query 'DBInstances[0].Endpoint.Address' \
--output text \
--region us-east-1
If using IAM authentication, note the DbiResourceId of the new instance and
update the RDSIAMAuthKine IAM policy to include it. Without this, platform
pods fail with Access denied for user 'kine' errors because the IAM
rds-db:connect permission is scoped to a specific RDS instance resource ID.
aws rds describe-db-instances \
--db-instance-identifier mariadb-multi-region-restored \
--query 'DBInstances[0].DbiResourceId' \
--output text \
--region us-east-1
Add the new resource ID to the policy's Resource array:
aws iam create-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/RDSIAMAuthKine \
--set-as-default \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "rds-db:connect",
"Resource": [
"arn:aws:rds-db:us-east-1:123456789012:dbuser:db-OLDXXXXXXXXXXXXXXXXXXXXXXXXXX/kine",
"arn:aws:rds-db:us-east-1:123456789012:dbuser:db-NEWXXXXXXXXXXXXXXXXXXXXXXXXXX/kine"
]
}
]
}'
Step 3 - Scale down both regions​
Scale both platform deployments to zero to stop all writes to the old database.
kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
scale deployment -n vcluster-platform loft --replicas=0
kubectl --context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
scale deployment -n vcluster-platform loft --replicas=0
Wait for all pods to stop:
kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
get pods -n vcluster-platform -l app=loft --watch
kubectl --context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
get pods -n vcluster-platform -l app=loft --watch
Step 4 - Update the values files​
Update the dataSource in both region values files to point to the new RDS
endpoint:
config:
database:
dataSource: "mysql://kine@tcp(mariadb-multi-region-restored.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306)/kine"
Step 5 - Upgrade both regions​
Apply the updated values files to both regions.
- vCluster CLI
- Helm
vcluster platform start \
--namespace vcluster-platform \
--kube-context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
--values platform-us-east-1-values.yaml \
--upgrade \
--no-tunnel
vcluster platform start \
--namespace vcluster-platform \
--kube-context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
--values platform-eu-west-1-values.yaml \
--upgrade \
--no-tunnel
helm upgrade loft vcluster-platform --install --create-namespace --repository-config='' \
--namespace vcluster-platform \
--repo "https://charts.loft.sh/" \
--version 4.8.0 \
--kube-context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
-f platform-us-east-1-values.yaml \
--server-side=true --force-conflicts
helm upgrade loft vcluster-platform --install --create-namespace --repository-config='' \
--namespace vcluster-platform \
--repo "https://charts.loft.sh/" \
--version 4.8.0 \
--kube-context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
-f platform-eu-west-1-values.yaml \
--server-side=true --force-conflicts
Step 6 - Scale up both regions​
The Helm upgrade keeps replicas at zero because it doesn't override the manual scale-down. Scale both regions back up.
kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
scale deployment -n vcluster-platform loft --replicas=3
kubectl --context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
scale deployment -n vcluster-platform loft --replicas=3
Wait for all pods to become ready:
kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 \
rollout status deployment/loft -n vcluster-platform
kubectl --context arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1 \
rollout status deployment/loft -n vcluster-platform
Step 7 - Verify the restore​
Confirm the platform is healthy on both regions:
for CTX in arn:aws:eks:us-east-1:123456789012:cluster/platform-multi-region-us-east-1 arn:aws:eks:eu-west-1:123456789012:cluster/platform-multi-region-eu-west-1; do
echo "=== $CTX ==="
kubectl --context "$CTX" get pods -n vcluster-platform -l app=loft
echo
done
Verify the platform UI is accessible through the shared DNS domain and that both Route 53 health checks return healthy.