Version: main 🚧

GPU and accelerator support

		Enterprise
Available in these plans	Free	Dev	Prod	Scale
DRA Sync

vCluster supports GPU and accelerator workloads when the node exposes those devices through standard Kubernetes mechanisms. vCluster does not configure the physical GPU, install the vendor driver, or choose the device presentation mode. The node image, operating system, and vendor device plugin or Dynamic Resource Allocation driver own that layer.

From the tenant cluster's perspective, GPU workloads use the same Kubernetes APIs they would use on a regular cluster:

Extended resources such as nvidia.com/gpu or amd.com/gpu.
Vendor device plugins, such as the NVIDIA device plugin, AMD GPU device plugin, or an accelerator vendor's equivalent plugin.
GPU Operators, when the vendor provides one.
Dynamic Resource Allocation (DRA) objects such as DeviceClass, ResourceClaim, and ResourceClaimTemplate.
Optional higher-level schedulers or platforms, such as NVIDIA KAI Scheduler, NVIDIA Run:ai, or Slurm integrations.

vCluster role

vCluster provides the tenant Kubernetes control plane and syncs the Kubernetes objects that workloads need. It also lets tenants run isolated clusters on shared or private worker nodes. It does not sit in the device path between a pod and the GPU.

This means:

If a node advertises nvidia.com/gpu, a tenant workload can request nvidia.com/gpu.
If a node advertises amd.com/gpu, a tenant workload can request amd.com/gpu.
If an accelerator vendor exposes a Kubernetes device plugin or DRA driver, vCluster can work with that driver's resources and DRA objects.
If the required driver, runtime configuration, device plugin, or DRA driver is missing from the node or tenant cluster, vCluster cannot make the device appear by itself.

For private nodes, each tenant cluster can run its own GPU Operator, device plugin, DRA driver, scheduler, and accelerator CRDs. This is the common model for GPU cloud platforms because the tenant owns the full worker-node software stack.

For shared host nodes, the device plugin and drivers usually run on the control plane cluster nodes. Tenant workloads can use the resources that the shared nodes advertise, subject to the sync and scheduling configuration.

Install the NVIDIA GPU Operator in a tenant cluster

With Private Nodes, the tenant cluster can install the NVIDIA GPU Operator directly because its workloads run on dedicated worker nodes. This is the common pattern for AI cloud and inference provider platforms: the tenant cluster owns the GPU Operator, device plugin, DCGM Exporter, MIG Manager, and related CRDs for its private node pool.

note

Install the GPU Operator inside the tenant cluster only when that tenant owns the GPU node software stack. On shared host nodes, the platform team usually installs the GPU driver stack and device plugin on the control plane cluster nodes instead.

Before you install

Prepare the private GPU nodes before installing the Operator:

Provision or join the GPU nodes to the tenant cluster.
Confirm the nodes run a supported Linux distribution, kernel, and container runtime.
Decide whether the Operator should install the NVIDIA driver or use a driver that is already installed in the node image.
Decide whether the tenant cluster needs MIG, NVIDIA vGPU, CDI, GPUDirect, or DCGM metrics.
Confirm kubectl and helm point at the tenant cluster, not the control plane cluster.

For the full vendor matrix and chart options, see the NVIDIA GPU Operator installation guide.

Install with Helm

Create the Operator namespace and label it for privileged workloads if your cluster uses Pod Security Admission:

kubectl create namespace gpu-operator
kubectl label --overwrite namespace gpu-operator pod-security.kubernetes.io/enforce=privileged

Add the NVIDIA Helm repository:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

Install the Operator in the tenant cluster:

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --wait

If your private node image already includes the NVIDIA driver, disable driver installation:

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --wait \
  --set driver.enabled=false

For inference endpoints, you can add Kubernetes pod labels such as endpoint, customer, or tier to DCGM metrics. Preserve the Operator values you selected during installation when you enable this optional enrichment:

helm upgrade gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --wait \
  --reuse-values \
  --set dcgmExporter.enablePodLabels=true

DCGM Exporter's Kubernetes mapping provides the core pod, namespace, and container attribution through the kubelet pod-resources API. The enablePodLabels setting adds labels from the Kubernetes Pod object as extra Prometheus dimensions.

Verify GPU availability

Check the Operator pods:

kubectl get pods -n gpu-operator
kubectl get clusterpolicy

Confirm the tenant cluster sees nvidia.com/gpu on the private GPU nodes:

kubectl get nodes -o 'custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'

Run a CUDA smoke test:

cuda-vectoradd.yaml
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vectoradd
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04
      resources:
        limits:
          nvidia.com/gpu: 1

Apply it and check the logs:

kubectl apply -f cuda-vectoradd.yaml
kubectl logs pod/cuda-vectoradd

Expect output similar to this, ending in Test PASSED:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Clean up the pod:

kubectl delete -f cuda-vectoradd.yaml

If the pod stays pending, check node readiness, taints, GPU resource names, project quotas, allowed node types, and whether the private node has joined the tenant cluster. If Operator pods fail, check the node OS, kernel, container runtime, driver installation mode, and the NVIDIA Operator logs.

Supported vendors and accelerators

NVIDIA

NVIDIA GPUs commonly use the NVIDIA GPU Operator or the NVIDIA device plugin. The node advertises resources such as nvidia.com/gpu. Workloads request that resource in resources.limits.

The node's driver and GPU Operator configuration control NVIDIA-specific modes such as MIG or NVIDIA vGPU. vCluster consumes the resulting Kubernetes resources. It does not create MIG partitions or configure vGPU profiles.

AMD

AMD GPUs use the same Kubernetes mechanism. Install and configure the AMD driver stack and AMD GPU device plugin, AMD GPU Operator, or DRA driver. The node then advertises the AMD resource, commonly amd.com/gpu. Tenant workloads request that resource like any other Kubernetes extended resource. For DRA configuration, see Dynamic resource allocation and device classes.

Other accelerators

Other accelerators, such as SambaNova devices, FPGAs, DPUs, or custom AI accelerators, follow the same rule. If the vendor exposes the device to Kubernetes, vCluster can work with that Kubernetes-facing interface.

Check the vendor documentation for the exact resource name, driver installation steps, and CRDs. Also confirm where the operator or controller should run.

Dynamic resource allocation and device classes

Dynamic Resource Allocation is useful when workloads need more detail than a simple resource count. For example, workloads might need device attributes, capacity slices, or administrator-controlled device classes.

DRA sync is disabled by default. To use DRA with shared host nodes, enable the settings your workload needs:

deviceClasses syncs allowed DeviceClass resources from the control plane cluster to the tenant cluster.
resourceClaims syncs tenant-created ResourceClaim resources to the control plane cluster.
resourceClaimTemplates syncs tenant-created ResourceClaimTemplate resources to the control plane cluster.

Once deviceClasses sync is enabled, platform administrators create DeviceClass resources on the control plane cluster and choose which classes are visible in each tenant cluster.

For private nodes, tenants can also run the DRA driver and related controllers inside their tenant cluster when they own the worker-node software stack.

Hardware presentation modes

GPU presentation mode is determined before vCluster schedules a workload:

Mode	Where it is configured	vCluster role
Bare-metal PCIe passthrough	Physical server, OS image, driver, and device plugin	Workloads request the advertised Kubernetes resource
NVIDIA vGPU	NVIDIA vGPU host and guest driver stack, OS image, and operator or plugin configuration	Workloads request the resource exposed by that stack
NVIDIA MIG	NVIDIA GPU Operator or device plugin configuration	Workloads request the MIG resources advertised by the plugin
DRA device allocation	Vendor DRA driver and `DeviceClass` resources	Syncs allowed DRA objects between the control plane cluster and tenant cluster

If you provision physical GPU servers with vMetal, vMetal controls the bare metal lifecycle and node OS image. The OS image and post-provision configuration determine which GPU drivers, vGPU stack, MIG strategy, or vendor plugins are available. For that layer, see GPU presentation modes in vMetal.

Summary checklist

To make GPU or accelerator workloads work in a tenant cluster:

Prepare the worker node with the required firmware, OS image, kernel modules, and vendor driver stack.
Install the vendor device plugin, GPU Operator, or DRA driver in the right cluster.
Confirm the node advertises the expected resource or DRA devices.
Configure vCluster sync for any required CRDs, scheduler objects, or DRA objects.
Run a workload that requests the advertised resource name or references the synced DeviceClass.

For GPU bare metal provisioning and OS image guidance, see vMetal GPU Quickstart.

vCluster role​

Install the NVIDIA GPU Operator in a tenant cluster​

Before you install​

Install with Helm​

Verify GPU availability​

Supported vendors and accelerators​

NVIDIA​

AMD​

Other accelerators​

Dynamic resource allocation and device classes​

Hardware presentation modes​

Summary checklist​

External references​