Skip to main content
Version: main 🚧

Build for Production

vCluster lets you provision isolated tenant clusters on your existing infrastructure without a separate physical cluster per tenant. This section maps the common production architectures to concrete implementation paths. Choose the one that matches what you are building and follow it from initial design to a running, operated platform.

Need to evaluate vCluster first?

Start with a quick start to prove the deployment model in your environment, then return here to plan production.

What are you building?​

You are buildingWhere to start
A managed Kubernetes service for paying customers on GPU infrastructureAI Cloud: Managed Kubernetes Service
An internal AI platform for your organization's R&D and engineering teamsEnterprise AI Factory
A unified GPU operations layer across multiple compute sources or suppliersDistributed Compute Aggregation
A shared Kubernetes platform for internal engineering teams and servicesInternal Kubernetes Platform
Ephemeral isolated clusters for CI/CD pipelines with automatic cleanupCI/CD Platform
A dedicated cluster stack per enterprise customerSingle-Tenant Per Customer
Tenant workloads at distributed edge sites from a central control planeEdge Distribution

If you are not sure which path fits, start with Architecture and Building a GPU cloud platform.

What production-ready means​

A production vCluster platform delivers:

  • Tenant isolation: every customer or team sees only their own cluster, nodes, and workloads
  • Repeatable provisioning: new tenant clusters deploy from a defined template, not from manual steps
  • A defined worker node model: shared nodes, dedicated node pools, private nodes, or Standalone, matched to your security, performance, and cost requirements
  • Governed access: who can create, access, and administer tenant clusters, enforced through Platform policies
  • Durable control planes: HA, data store, and backup procedures defined before tenants depend on the system
  • Operational readiness: monitoring, upgrade, restore, and incident response procedures documented and tested

Coming from a quick start?​

Each quick start validates a specific deployment model. Use this table to connect what you proved to the production path that extends it.

If you completedYou have provenProduction paths to consider
Docker (vind)Local or CI cluster behaviorUse vind for CI only. Choose a path above based on your production use case.
Shared NodesTenant clusters on an existing Kubernetes clusterInternal Kubernetes Platform, CI/CD Platform, or Enterprise AI Factory (shared tier)
Private NodesTenant clusters with dedicated worker nodesAI Cloud, Enterprise AI Factory (production tier), Single-Tenant Per Customer
StandaloneControl Plane Cluster on bare metal or VMsAI Cloud, Distributed Compute Aggregation, Enterprise AI Factory (on-premises)

Day 2 operations reference​

Common operations that apply across all paths.

OperationRead next
Monitor Platform and tenant workloadsMonitoring overview, fleet monitoring
Back up and restore tenant clustersSnapshots, restore, Velero
Back up and restore PlatformBackup and restore Platform, Platform database
Upgrade Platform and tenant clustersUpgrade vCluster, upgrade Platform, lifecycle policy
Rotate certificatesCertificate rotation
Manage private worker nodesManage private nodes, Auto Nodes
Scale and recover the platformPlatform HA, multi-region Platform
Troubleshoot incidentsvCluster troubleshoot, debug commands, Platform troubleshooting