Skip to main content
Version: main 🚧

Auto Nodes

You can configure vCluster to automatically provision and join worker nodes based on the node and resource requirements. To use auto nodes, you need vCluster Platform installed and vCluster needs to be connected to it.

The feature is based on Karpenter, which is a cluster autoscaler for Kubernetes that chooses the best node for the requested amount of pods and resources. Karpenter is built into vCluster and does not need to be installed. This allows vCluster to use Karpenter to do the node management and scheduling. The provisioning of the nodes is handled by vCluster Platform.


vCluster provides multiple node providers to automatically create worker nodes:

  • Terraform
  • KubeVirt
  • Nvidia Base Command Manager

How does it work?​

vCluster uses Karpenter to provision nodes. Karpenter is an open-source node autoscaler built for Kubernetes which can dramatically improve the efficiency and cost of running workloads on that cluster.

Karpenter works by:

  • Watching for pods that the Kubernetes scheduler has marked as unschedulable
  • Evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints) requested by the pods
  • Provisioning nodes that meet the requirements of the pods
  • Disrupting the nodes when the nodes are no longer needed

Under the hood, vCluster creates Karpenter NodePools which are read-only to the end-users inside the vCluster. When Karpenter finds unschedulable pods inside the vCluster or if there is a static node pool that does not have the desired quantity of nodes deployed, a new Karpenter NodeClaim is created. Out of the Karpenter node claim, vCluster will create a platform node claim, which then is assigned to a node type and provisioned by the node provider (e.g. terraform). Terraform then creates a new node according to the specified terraform script and joins the node via cloud-init into the vCluster.


For each node claim, Karpenter specifies the potentially fitting platform node types sorted by cost as well as the requested resources for the node. In addition, requirements within node pools can be defined by the user to filter node types, e.g. by region, cpu generation or custom defined properties. The platform will then do the final scheduling based on Karpenter suggestions what node type to use and then ultimately provision. The decided node type is then passed to the node provider which in turn creates the actual node based on the node type and joins it into the vCluster.

Scheduling Example​

For example, when using the following vcluster.yaml configuration:

privateNodes:
enabled: true
nodePools:
dynamic:
- name: my-node-pool
provider: my-node-provider
requirements:
- property: my-custom-property
value: my-value

The following Karpenter NodePool will get created:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: my-node-pool
spec:
template:
metadata:
labels:
vcluster.loft.sh/provider-platform: my-node-provider
spec:
requirements:
- key: my-custom-property
operator: In
values:
- my-value

When Karpenter decides to create a new node, it will create a new node claim resource that looks like this:

apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
name: my-node-pool-j6b8n
spec:
requirements:
- key: my-custom-property
operator: In
values:
- my-value
- key: node.kubernetes.io/instance-type
operator: In
values:
- my-node-provider.large
- my-node-provider.medium
resources:
requests:
cpu: 120m
memory: 114Mi
pods: '5'

vCluster will then create a new platform node claim in the project the vCluster was created or connected to:

apiVersion: management.loft.sh/v1
kind: NodeClaim
metadata:
name: vcluster-62ftb
namespace: p-default
spec:
desiredCapacity:
cpu: 120m
memory: 114Mi
pods: '5'
requirements:
- key: my-custom-property
operator: In
values:
- my-value
- key: node.kubernetes.io/instance-type
operator: In
values:
- my-node-provider.large
- my-node-provider.medium
providerRef: my-node-provider
vClusterRef: vcluster

Karpenter will only list the node types that would actually fit the desired capacity, so there might be a my-node-provider.small which would have been filtered out by Karpenter already.

The platform will then decide what node type to use based on the specified requirements and then set the spec.typeRef field. In this case it would use the my-node-provider.medium node type as that is cheaper than the large one an then create a node based with the node provider (e.g. through tofu apply).

Pre-requisites​

  • vCluster control plane has to be running and in Ready state and connected to vCluster Platform
  • A node provider is configured in vCluster Platform.

Node Pools​

There are two types of node pools, that can be figured independently or combined with each other.

  • static: Defines a fixed quantity of each node to provision
  • dynamic: No quantity of nodes is defined. The built-in Karpenter automatically decides how many nodes are needed. You can define a limit of nodes to provision.

Deciding how to limit which node types to provision is based on the requirements defined in each node pool.

Example with both static and dynamic node pools
privateNodes:
# Private nodes need to be enabled for this feature to work
enabled: true
autoNodes:
# Fixed size node pool of 2
static:
- name: my-static-node-pool
provider: my-node-provider
quantity: 2
# Dynamic node pool
dynamic:
- name: my-dynamic-node-pool
provider: my-node-provider
limits:
nodes: 3
No vCluster restart required

Changing fields within privateNodes.autoNodes will not restart the vCluster even on a helm upgrade

It's also possible to mix different providers within the same vCluster. You can specify the provider via the provider field that should reference a node provider created in the platform.

Example mixing multiple providers
privateNodes:
enabled: true
# Enable vCluster VPN so that nodes can talk to each other
# even if they are not in the same network.
vpn:
enabled: true
nodeToNode:
enabled: true
autoNodes:
static:
# Nodes from AWS
- name: aws-pool
provider: aws
quantity: 1
requirements:
- property: vcluster.com/node-type
value: t3.medium
# Nodes from GCP
- name: gcp-pool
provider: gcp
quantity: 1
requirements:
- property: vcluster.com/node-type
value: n4-standard-2

This allows you to easily create cross-cloud vClusters either statically or through the dynamic scaling via Karpenter.

Requirements​

Requirements on a node pool can be used to include or exclude certain node types. These allow you to select properties on node types via Kubernetes set-based requirements.

Examples of how to set requirements
privateNodes:
enabled: true
autoNodes:
static:
- name: my-static-pool
provider: my-provider
requirements:
# Exact match
- property: my-property
value: my-value
# One of
- property: my-property
operator: In
values: ["value-1", "value-2", "value-3"]
# Not in
- property: my-property
operator: NotIn
values: ["value-1", "value-2", "value-3"]
# Exists
- property: my-property
operator: Exists
# NotExists
- property: my-property
operator: NotExists

The following operators are available and supported:

  • In (default): Matches one or multiple values on the node type
  • NotIn: Matches if the given values aren't part of the properties
  • Exists: Matches if the property is defined on the node type
  • NotExists: Matches if the property is not defined on the node type

Built-in node type properties​

Each node type automatically has the following properties available. You can also add custom properties to node types.

PropertyValueUse Case
vcluster.com/node-typeThe name of the node type to use.Map vCluster node pools to only use a specific node type. Since node type names are globally unique, they also always map to a single node provider.
node.kubernetes.io/instance-typeSame as vcluster.com/node-type, but just the official Kubernetes label.Map vCluster node pools to only use a specific node type. Since node type names are globally unique, they also always map to a single node provider.
topology.kubernetes.io/zoneMaps to the spec.zone field of the node type. If unspecified, will be global.Map vCluster node pools to only use specific regions of node types.
karpenter.sh/capacity-typeFixed to on-demand.Only on-demand nodes are supported.
kubernetes.io/osFixed to linux.Only Linux nodes are supported.

Dynamic node pools​

Dynamic node pools are powered by Karpenter, and for each dynamic node pool a Karpenter node pool is created.

Example of only using nodes from the a specific node provider
privateNodes:
# Private nodes need to be enabled for this feature to work
enabled: true
autoNodes:
dynamic:
- name: my-dynamic-node-pool
provider: my-node-provider

Disruption​

Disruption configures how Karpenter should disrupt nodes and the config corresponds to the Karpenter disruption config. By default, Karpenter will disrupt nodes if they are empty or underutilized after 30 seconds of inactivity.

You can define more advanced ways of disruptions via schedules or budgets according to the Karpenter config.

Example of creating advanced disruption configuration
privateNodes:
enabled: true
autoNodes:
dynamic:
- name: my-dynamic-node-pool
provider: my-provider
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10s
budgets:
- nodes: "20%"
reasons:
- "Empty"
- "Drifted"
- nodes: "5"
- nodes: "0"
schedule: "@daily"
duration: 10m
reasons:
- "Underutilized"

Limits​

Limits can be used as an upper limit for scheduling. These limits correspond to Karpenter limits. Besides what Karpenter offers, it is also possible to specify nodes as a limit itself.

Example of limiting 10 nodes or 100 cpus total
privateNodes:
enabled: true
autoNodes:
dynamic:
- name: my-dynamic-node-pool
provider: my-provider
limits:
cpu: 100 # either combined amount of cpus across all nodes in this node pool
nodes: 10 # or maximum amount of nodes
Too small limits

When limits are too low (e.g. cpu is 1 and the smallest node type has cpu of 2) nodes will not provision. Make sure to use appropriate limits. When using limits.nodes consider that this might be the biggest nodes, so its usually a good idea to use a combination of limits.cpu and limits.nodes.

Static node pools​

Static node pools are always created independent regardless of how many nodes are needed. They always require a quantity and a set of requirements to select which node types to deploy. When creating static node pools, they are also Karpenter NodeClaims, which allows Karpenter to take these static nodes into account when a dynamic node pool is also configured.

Example of a static node pool of 2 nodes from a specific node provider
privateNodes:
# Private nodes need to be enabled for this feature to work
enabled: true
autoNodes:
static:
- name: my-static-node-pool
provider: my-provider
quantity: 2

You can change the quantity of static node pools without restarting vCluster and vCluster will scale these nodes up or down based on the changing quantity.

Taints and node labels​

You can define taints and node labels for each node pool via the taints and nodeLabels fields which are useful to control scheduling on these nodes.

Example of node pools with taints and node labels
privateNodes:
enabled: true
autoNodes:
static:
- name: my-static-pool
provider: my-provider
quantity: 1
nodeLabels:
my-label: my-value
taints:
- key: my-taint
effect: NoSchedule
dynamic:
- name: my-static-pool
provider: my-provider
nodeLabels:
my-label: my-value
taints:
- key: my-taint
effect: NoSchedule

Config reference​

autoNodes required object ​

AutoNodes stores Auto Nodes configuration static and dynamic NodePools managed by Karpenter

static required object[] ​

Static defines static node pools. Static node pools have a fixed size and are not scaled automatically.

name required string ​

Name is the name of this static nodePool

provider required string ​

Provider is the node provider of the nodes in this pool.

requirements required object[] ​

Requirements filter the types of nodes that can be provisioned by this pool. All requirements must be met for a node type to be eligible.

property required string ​

Property is the property on the node type to select.

operator required string ​

Operator is the comparison operator, such as "In", "NotIn", "Exists". If empty, defaults to "In".

values required string[] ​

Values is the list of values to use for comparison. This is mutually exclusive with value.

value required string ​

Value is the value to use for comparison. This is mutually exclusive with values.

taints required object[] ​

Taints are the taints to apply to the nodes in this pool.

key required string ​

Required. The taint key to be applied to a node.

value required string ​

The taint value corresponding to the taint key.

effect required string ​

Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

nodeLabels required object ​

NodeLabels are the labels to apply to the nodes in this pool.

terminationGracePeriod required string ​

TerminationGracePeriod is the maximum duration the controller will wait before forcefully deleting the pods on a node, measured from when deletion is first initiated.

Warning: this feature takes precedence over a Pod's terminationGracePeriodSeconds value, and bypasses any blocked PDBs or the karpenter.sh/do-not-disrupt annotation.

This field is intended to be used by cluster administrators to enforce that nodes can be cycled within a given time period. When set, drifted nodes will begin draining even if there are pods blocking eviction. Draining will respect PDBs and the do-not-disrupt annotation until the TGP is reached.

Karpenter will preemptively delete pods so their terminationGracePeriodSeconds align with the node's terminationGracePeriod. If a pod would be terminated without being granted its full terminationGracePeriodSeconds prior to the node timeout, that pod will be deleted at T = node timeout - pod terminationGracePeriodSeconds.

The feature can also be used to allow maximum time limits for long-running jobs which can delay node termination with preStop hooks. Defaults to 30s. Set to Never to wait indefinitely for pods to be drained.

quantity required integer ​

Quantity is the number of desired nodes in this pool.

dynamic required object[] ​

Dynamic defines dynamic node pools. Dynamic node pools are scaled automatically based on the requirements within the cluster. Karpenter is used under the hood to handle the scheduling of the nodes.

name required string ​

Name is the name of this NodePool

provider required string ​

Provider is the node provider of the nodes in this pool.

requirements required object[] ​

Requirements filter the types of nodes that can be provisioned by this pool. All requirements must be met for a node type to be eligible.

property required string ​

Property is the property on the node type to select.

operator required string ​

Operator is the comparison operator, such as "In", "NotIn", "Exists". If empty, defaults to "In".

values required string[] ​

Values is the list of values to use for comparison. This is mutually exclusive with value.

value required string ​

Value is the value to use for comparison. This is mutually exclusive with values.

taints required object[] ​

Taints are the taints to apply to the nodes in this pool.

key required string ​

Required. The taint key to be applied to a node.

value required string ​

The taint value corresponding to the taint key.

effect required string ​

Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

nodeLabels required object ​

NodeLabels are the labels to apply to the nodes in this pool.

limits required object ​

Limits specify the maximum resources that can be provisioned by this node pool, mapping to the 'limits' field in Karpenter's NodePool API.

disruption required object ​

Disruption contains the parameters that relate to Karpenter's disruption logic

consolidateAfter required string ​

ConsolidateAfter is the duration the controller will wait before attempting to terminate nodes that are underutilized. Refer to ConsolidationPolicy for how underutilization is considered.

consolidationPolicy required string ​

ConsolidationPolicy describes which nodes Karpenter can disrupt through its consolidation algorithm. This policy defaults to "WhenEmptyOrUnderutilized" if not specified

budgets required object[] ​

Budgets is a list of Budgets. If there are multiple active budgets, Karpenter uses the most restrictive value. If left undefined, this will default to one budget with a value to 10%.

nodes required string ​

Nodes dictates the maximum number of NodeClaims owned by this NodePool that can be terminating at once. This is calculated by counting nodes that have a deletion timestamp set, or are actively being deleted by Karpenter. This field is required when specifying a budget.

schedule required string ​

Schedule specifies when a budget begins being active, following the upstream cronjob syntax. If omitted, the budget is always active. Timezones are not supported.

duration required string ​

Duration determines how long a Budget is active since each Schedule hit. Only minutes and hours are accepted, as cron does not work in seconds. If omitted, the budget is always active. This is required if Schedule is set.

terminationGracePeriod required string ​

TerminationGracePeriod is the maximum duration the controller will wait before forcefully deleting the pods on a node, measured from when deletion is first initiated.

Warning: this feature takes precedence over a Pod's terminationGracePeriodSeconds value, and bypasses any blocked PDBs or the karpenter.sh/do-not-disrupt annotation.

This field is intended to be used by cluster administrators to enforce that nodes can be cycled within a given time period. When set, drifted nodes will begin draining even if there are pods blocking eviction. Draining will respect PDBs and the do-not-disrupt annotation until the TGP is reached.

Karpenter will preemptively delete pods so their terminationGracePeriodSeconds align with the node's terminationGracePeriod. If a pod would be terminated without being granted its full terminationGracePeriodSeconds prior to the node timeout, that pod will be deleted at T = node timeout - pod terminationGracePeriodSeconds.

The feature can also be used to allow maximum time limits for long-running jobs which can delay node termination with preStop hooks. Defaults to 30s. Set to Never to wait indefinitely for pods to be drained.

expireAfter required string ​

The amount of time a Node can live on the cluster before being removed

weight required integer ​

Weight is the weight of this node pool.