Version: main 🚧

Build for Production

vCluster lets you provision isolated tenant clusters on your existing infrastructure without a separate physical cluster per tenant. This section maps the common production architectures to concrete implementation paths. Choose the one that matches what you are building and follow it from initial design to a running, operated platform.

Need to evaluate vCluster first?

Start with a quick start to prove the deployment model in your environment, then return here to plan production.

What are you building?

You are building	Where to start
A managed Kubernetes service for paying customers on GPU infrastructure	AI Cloud: Managed Kubernetes Service
An internal AI platform for your organization's R&D and engineering teams	Enterprise AI Factory
A unified GPU operations layer across multiple compute sources or suppliers	Distributed Compute Aggregation
A shared Kubernetes platform for internal engineering teams and services	Internal Kubernetes Platform
Ephemeral isolated clusters for CI/CD pipelines with automatic cleanup	CI/CD Platform
A dedicated cluster stack per enterprise customer	Single-Tenant Per Customer
Tenant workloads at distributed edge sites from a central control plane	Edge Distribution

If you are not sure which path fits, start with Architecture and Building a GPU cloud platform.

What production-ready means

A production vCluster platform delivers:

Tenant isolation: every customer or team sees only their own cluster and workloads. Node level isolation requires private nodes
Repeatable provisioning: new tenant clusters deploy from a defined template, not from manual steps
A defined worker node model: shared nodes, dedicated node pools, private nodes, or Standalone, matched to your security, performance, and cost requirements
Governed access: who can create, access, and administer tenant clusters, enforced through Platform policies
Durable control planes: HA, data store, and backup procedures defined before tenants depend on the system
Operational readiness: monitoring, upgrade, restore, and incident response procedures documented and tested

Coming from a quick start?

Each quick start validates a specific deployment model. Use this table to connect what you proved to the production path that extends it.

If you completed	You have proven	Production paths to consider
Docker (vind)	Local or CI cluster behavior	Use vind for CI only. Choose a path above based on your production use case.
Shared Nodes	Tenant clusters on an existing Kubernetes cluster	Internal Kubernetes Platform, CI/CD Platform, or Enterprise AI Factory (shared tier)
Private Nodes	Tenant clusters with dedicated worker nodes	AI Cloud, Enterprise AI Factory (production tier), Single-Tenant Per Customer
Standalone	Control Plane Cluster on bare metal or VMs	AI Cloud, Distributed Compute Aggregation, Enterprise AI Factory (on-premises)

Day 2 operations reference

Common operations that apply across all paths.

Operation	Read next
Monitor Platform and tenant workloads	Monitoring overview, fleet monitoring
Back up and restore tenant clusters	Snapshots, restore, Velero
Back up and restore Platform	Backup and restore Platform, Platform database
Upgrade Platform and tenant clusters	Upgrade vCluster, upgrade Platform, lifecycle policy
Rotate certificates	Certificate rotation
Manage private worker nodes	Manage private nodes, Auto Nodes
Scale and recover the platform	Platform HA, multi-region Platform
Troubleshoot incidents	vCluster troubleshoot, debug commands, Platform troubleshooting

What are you building?​

What production-ready means​

Coming from a quick start?​

Day 2 operations reference​

What are you building?

What production-ready means

Coming from a quick start?

Day 2 operations reference