Version: main 🚧

Auto Nodes

You can configure vCluster to automatically provision and join worker nodes based on the node and resource requirements. To use auto nodes, you need vCluster Platform installed and vCluster needs to be connected to it.

The feature is based on Karpenter, which is a cluster autoscaler for Kubernetes that chooses the best node for the requested amount of pods and resources. Karpenter is built into vCluster and does not need to be installed. This allows vCluster to use Karpenter to do the node management and scheduling. The provisioning of the nodes is handled by vCluster Platform.

vCluster provides multiple node providers to automatically create worker nodes:

Terraform
KubeVirt
Nvidia Base Command Manager

How does it work?

vCluster uses Karpenter to provision nodes. Karpenter is an open source node autoscaler built for Kubernetes which can dramatically improve the efficiency and cost of running workloads on that cluster.

Karpenter works by:

Watching for pods that the Kubernetes scheduler has marked as unschedulable
Evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints) requested by the pods
Provisioning nodes that meet the requirements of the pods
Disrupting the nodes when the nodes are no longer needed

Under the hood, vCluster creates Karpenter NodePools which are read-only to the end-users inside the vCluster. When Karpenter finds unschedulable pods inside the vCluster or if there is a static node pool that does not have the desired quantity of nodes deployed, a new Karpenter NodeClaim is created. Out of the Karpenter node claim, vCluster will create a platform node claim, which then is assigned to a node type and provisioned by the node provider (e.g. terraform). Terraform then creates a new node according to the specified terraform script and joins the node via cloud-init into the vCluster.

For each node claim, Karpenter specifies the potentially fitting platform node types sorted by cost as well as the requested resources for the node. In addition, requirements within node pools can be defined by the user to filter node types, e.g. by region, cpu generation or custom defined properties. The platform will then do the final scheduling based on Karpenter suggestions what node type to use and then ultimately provision. The decided node type is then passed to the node provider which in turn creates the actual node based on the node type and joins it into the vCluster.

Scheduling Example

For example, when using the following vcluster.yaml configuration:

privateNodes:
  enabled: true
  autoNodes:
    provider: my-node-provider
    properties:
      my-custom_property: my-value
    dynamic:
      - name: my-node-pool

The following Karpenter NodePool will get created:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: my-node-pool
spec:
  template:
    metadata:
      labels:
        vcluster.loft.sh/provider-platform: my-node-provider
    spec:
      requirements:
        - key: my-custom-property
          operator: In
          values:
            - my-value

When Karpenter decides to create a new node, it will create a new node claim resource that looks like this (the j6b8n is an example of the auto-generated ID by Kubernetes):

apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
  name: my-node-pool-j6b8n
spec:
  requirements:
    - key: my-custom-property
      operator: In
      values:
        - my-value
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - my-node-provider.large
        - my-node-provider.medium
  resources:
    requests:
      cpu: 120m
      memory: 114Mi
      pods: '5'

vCluster will then create a new platform node claim in the project the vCluster was created or connected to:

apiVersion: management.loft.sh/v1
kind: NodeClaim
metadata:
  name: vcluster-62ftb
  namespace: p-default
spec:
  desiredCapacity:
    cpu: 120m
    memory: 114Mi
    pods: '5'
  requirements:
    - key: my-custom_property
      operator: In
      values:
        - my-value
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - my-node-provider.large
        - my-node-provider.medium
  providerRef: my-node-provider
  vClusterRef: vcluster

Karpenter will only list the node types that would actually fit the desired capacity, so there might be a my-node-provider.small which would have been filtered out by Karpenter already.

The platform will then decide what node type to use based on the specified requirements and then set the spec.typeRef field. In this case it would use the my-node-provider.medium node type as that is cheaper than the large one an then create a node based with the node provider (e.g. through tofu apply).

warning

Additional workloads and pods might be introduced across different vCluster versions which would require additional resources. When auto node is enabled in older versions, the vCluster will automatically provision more nodes to meet the increased resource demands in newer versions.

Prerequisites

vCluster control plane has to be running and in Ready state and connected to vCluster Platform
A node provider is configured in vCluster Platform.

Node pools

There are two types of node pools, that can be figured independently or combined with each other.

static: Defines a fixed quantity of each node to provision. When the number of nodes is changed, the built-in Karpenter automatically scales up/down the number of nodes.
dynamic: No quantity of nodes is defined. The built-in Karpenter automatically decides how many nodes are needed. You can define a limit of nodes to provision.

Deciding how to limit which node types to provision is based on the requirements defined in each node pool.

Example with both static and dynamic node pools
privateNodes:
  enabled: true
  autoNodes:
  - provider: my-node-provider
    # Fixed size node pool of 2
    static:
      - name: my-static-node-pool
        quantity: 2
    # Dynamic node pool
    dynamic:
      - name: my-dynamic-node-pool
        limits:
          nodes: 3

No vCluster restart required

Changing fields within privateNodes.autoNodes will not restart the vCluster even on a helm upgrade

It's also possible to mix different providers within the same vCluster. You can specify the provider via the provider field that should reference a node provider created in the platform.

warning

Once the node pool is created using the configured provider in vCluster, you can not edit the provider later.

Node type selector

Node type selector on a node pool can be used to include or exclude certain node types. These allow you to select properties on node types via Kubernetes set-based requirements.

Examples of how to set node type selector
privateNodes:
  enabled: true
  autoNodes:
    provider: my-provider
    properties:
      # Exact match
      my-propert: my-value

      # One of
      my-property/value-1: true
      my-property/value-2: true
      my-property/value-3: true
      
      # Not in
      my-property/value-1: false
      my-property/value-2: false
      my-property/value-3: false

      # Exists
      my-property: true

      # Not exists
      my-property: false
    static:
      - name: my-static-pool

The following operators are available and supported:

In (default): Matches one or multiple values on the node type
NotIn: Matches if the given values aren't part of the properties
Exists: Matches if the property is defined on the node type
NotExists: Matches if the property is not defined on the node type

When an invalid property is updated to a valid value, or a valid property is changed to an invalid one, the nodes are automatically rolled out and updated to comply with the new configuration.

warning

Changing the node region is not supported. If the node region property is modified, the nodes will not roll out to the new region, and the nodes in the previous region will not be cleaned up automatically.

Built-in node type properties

Each node type automatically has the following properties available. You can also add custom properties to node types.

Property	Value	Use Case
`vcluster.com/node-type`	The name of the node type to use.	Map vCluster node pools to only use a specific node type. Since node type names are globally unique, they also always map to a single node provider.
`node.kubernetes.io/instance-type`	Same as `vcluster.com/node-type`, but just the official Kubernetes label.	Map vCluster node pools to only use a specific node type. Since node type names are globally unique, they also always map to a single node provider.
`topology.kubernetes.io/zone`	Maps to the `spec.zone` field of the node type. If unspecified, will be `global`.	Map vCluster node pools to only use specific regions of node types.
`karpenter.sh/capacity-type`	Fixed to `on-demand`.	Only on-demand nodes are supported.
`kubernetes.io/os`	Fixed to `linux`.	Only Linux nodes are supported.

Dynamic node pools

Dynamic node pools are powered by Karpenter, and for each dynamic node pool a Karpenter node pool is created.

Example of only using nodes from the a specific node provider
privateNodes:
  # Private nodes need to be enabled for this feature to work
  enabled: true 
  autoNodes:
    provider: my-node-provider
    dynamic:
      - name: my-dynamic-node-pool

Disruption

Disruption configures how Karpenter should disrupt nodes and the config corresponds to the Karpenter disruption config. By default, Karpenter will disrupt nodes if they are empty or underutilized after 30 seconds of inactivity.

You can define more advanced ways of disruptions via schedules or budgets according to the Karpenter config.

Example of creating advanced disruption configuration
privateNodes:
  enabled: true
  autoNodes:
    provider: my-provider
    dynamic:
      - name: my-dynamic-node-pool
        disruption:
          consolidationPolicy: WhenEmptyOrUnderutilized
          consolidateAfter: 10s
          budgets:
            - nodes: "20%"
              reasons:
                - "Empty"
                - "Drifted"
            - nodes: "5"
            - nodes: "0"
              schedule: "@daily"
              duration: 10m
              reasons:
                - "Underutilized"

Limits

Limits can be used as an upper limit for scheduling. These limits correspond to Karpenter limits. Besides what Karpenter offers, it is also possible to specify nodes as a limit itself. The number of nodes specified in limits can be modified after the nodes are provisioned. If the new node limit is lower than the number of currently provisioned nodes, the Available Nodes count in the vCluster platform may appear as a negative value.

Example of limiting 10 nodes or 100 cpus total
privateNodes:
  enabled: true 
  autoNodes:
    provider: my-provider
    dynamic:
      - name: my-dynamic-node-pool
        limits:
          cpu: 100  # either combined amount of cpus across all nodes in this node pool
          nodes: 10 # or maximum amount of nodes

Too small limits

When limits are too low (e.g. CPU is 1 and the smallest node type has CPU of 2) nodes will not provision. Make sure to use appropriate limits. When using limits.nodes consider that this might be the biggest nodes, so its usually a good idea to use a combination of limits.cpu and limits.nodes.

Static node pools

Static node pools are always created independent regardless of how many nodes are needed. They always require a quantity and a set of requirements to select which node types to deploy. When creating static node pools, they are also Karpenter NodeClaims, which allows Karpenter to take these static nodes into account when a dynamic node pool is also configured.

Example of a static node pool of 2 nodes from a specific node provider
privateNodes:
  # Private nodes need to be enabled for this feature to work
  enabled: true 
  autoNodes:
    provider: my-provider
    static:
      - name: my-static-node-pool
        quantity: 2

You can change the quantity of static node pools without restarting vCluster and vCluster will scale these nodes up or down based on the changing quantity.

Taints and node labels

You can define taints and node labels for each node pool via the taints and nodeLabels fields which are useful to control scheduling on these nodes.

Example of node pools with taints and node labels
privateNodes:
  enabled: true
  autoNodes:
    provider: my-provider
    static:
      - name: my-static-pool
        quantity: 1
        nodeLabels:
          my-label: my-value
        taints:
          - key: my-taint
            effect: NoSchedule
    dynamic:
      - name: my-static-pool
        nodeLabels:
          my-label: my-value
        taints:
          - key: my-taint
            effect: NoSchedule

Config reference

`autoNodes` required object[]

AutoNodes stores auto nodes configuration.

`provider` required string

Provider is the node provider of the nodes in this pool.

`properties` required object

Properties are the node provider properties. This is a simple key value map and can contain things like region, subscription, etc. that is then used by the node provider to create the nodes and node environment.

`static` required object[]

Static defines static node pools. Static node pools have a fixed size and are not scaled automatically.

`name` required string

Name is the name of this static nodePool

`nodeTypeSelector` required object[]

NodeTypeSelector filters the types of nodes that can be provisioned by this pool. All requirements must be met for a node type to be eligible.

`property` required string

Property is the property on the node type to select.

`operator` required string

Operator is the comparison operator, such as "In", "NotIn", "Exists". If empty, defaults to "In".

`values` required string[]

Values is the list of values to use for comparison. This is mutually exclusive with value.

`value` required string

Value is the value to use for comparison. This is mutually exclusive with values.

`taints` required object[]

Taints are the taints to apply to the nodes in this pool.

`key` required string

Required. The taint key to be applied to a node.

`value` required string

The taint value corresponding to the taint key.

`effect` required string

Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

`nodeLabels` required object

NodeLabels are the labels to apply to the nodes in this pool.

`terminationGracePeriod` required string

TerminationGracePeriod is the maximum duration the controller will wait before forcefully deleting the pods on a node, measured from when deletion is first initiated.

Warning: this feature takes precedence over a Pod's terminationGracePeriodSeconds value, and bypasses any blocked PDBs or the karpenter.sh/do-not-disrupt annotation.

This field is intended to be used by cluster administrators to enforce that nodes can be cycled within a given time period. When set, drifted nodes will begin draining even if there are pods blocking eviction. Draining will respect PDBs and the do-not-disrupt annotation until the TGP is reached.

Karpenter will preemptively delete pods so their terminationGracePeriodSeconds align with the node's terminationGracePeriod. If a pod would be terminated without being granted its full terminationGracePeriodSeconds prior to the node timeout, that pod will be deleted at T = node timeout - pod terminationGracePeriodSeconds.

The feature can also be used to allow maximum time limits for long-running jobs which can delay node termination with preStop hooks. Defaults to 30s. Set to Never to wait indefinitely for pods to be drained.

`quantity` required integer

Quantity is the number of desired nodes in this pool.

`dynamic` required object[]

Dynamic defines dynamic node pools. Dynamic node pools are scaled automatically based on the requirements within the cluster. Karpenter is used under the hood to handle the scheduling of the nodes.

`name` required string

Name is the name of this NodePool

`nodeTypeSelector` required object[]

NodeTypeSelector filters the types of nodes that can be provisioned by this pool. All requirements must be met for a node type to be eligible.

`property` required string

Property is the property on the node type to select.

`operator` required string

Operator is the comparison operator, such as "In", "NotIn", "Exists". If empty, defaults to "In".

`values` required string[]

Values is the list of values to use for comparison. This is mutually exclusive with value.

`value` required string

Value is the value to use for comparison. This is mutually exclusive with values.

`taints` required object[]

Taints are the taints to apply to the nodes in this pool.

`key` required string

Required. The taint key to be applied to a node.

`value` required string

The taint value corresponding to the taint key.

`effect` required string

Required. The effect of the taint on pods that do not tolerate the taint. Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

`nodeLabels` required object

NodeLabels are the labels to apply to the nodes in this pool.

`limits` required object

Limits specify the maximum resources that can be provisioned by this node pool, mapping to the 'limits' field in Karpenter's NodePool API.

`disruption` required object

Disruption contains the parameters that relate to Karpenter's disruption logic

`consolidateAfter` required string

ConsolidateAfter is the duration the controller will wait before attempting to terminate nodes that are underutilized. Refer to ConsolidationPolicy for how underutilization is considered.

`consolidationPolicy` required string

ConsolidationPolicy describes which nodes Karpenter can disrupt through its consolidation algorithm. This policy defaults to "WhenEmptyOrUnderutilized" if not specified

`budgets` required object[]

Budgets is a list of Budgets. If there are multiple active budgets, Karpenter uses the most restrictive value. If left undefined, this will default to one budget with a value to 10%.

`nodes` required string

Nodes dictates the maximum number of NodeClaims owned by this NodePool that can be terminating at once. This is calculated by counting nodes that have a deletion timestamp set, or are actively being deleted by Karpenter. This field is required when specifying a budget.

`schedule` required string

Schedule specifies when a budget begins being active, following the upstream cronjob syntax. If omitted, the budget is always active. Timezones are not supported.

`duration` required string

Duration determines how long a Budget is active since each Schedule hit. Only minutes and hours are accepted, as cron does not work in seconds. If omitted, the budget is always active. This is required if Schedule is set.

`terminationGracePeriod` required string

TerminationGracePeriod is the maximum duration the controller will wait before forcefully deleting the pods on a node, measured from when deletion is first initiated.

Warning: this feature takes precedence over a Pod's terminationGracePeriodSeconds value, and bypasses any blocked PDBs or the karpenter.sh/do-not-disrupt annotation.

`expireAfter` required string

The amount of time a Node can live on the cluster before being removed

`weight` required integer

Weight is the weight of this node pool.

How does it work?​

Scheduling Example​

Prerequisites​

Node pools​

Node type selector​

Built-in node type properties​

Dynamic node pools​

Disruption​

Limits​

Static node pools​

Taints and node labels​

Config reference​

autoNodes required object[] ​

provider required string ​

properties required object ​

static required object[] ​

name required string ​

nodeTypeSelector required object[] ​

property required string ​

operator required string ​

values required string[] ​

value required string ​

taints required object[] ​

key required string ​

value required string ​

effect required string ​

nodeLabels required object ​

terminationGracePeriod required string ​

quantity required integer ​

dynamic required object[] ​

name required string ​

nodeTypeSelector required object[] ​

property required string ​

operator required string ​

values required string[] ​

value required string ​

taints required object[] ​

key required string ​

value required string ​

effect required string ​

nodeLabels required object ​

limits required object ​

disruption required object ​

consolidateAfter required string ​

consolidationPolicy required string ​

budgets required object[] ​

nodes required string ​

schedule required string ​

duration required string ​

terminationGracePeriod required string ​

expireAfter required string ​

weight required integer ​

How does it work?

Scheduling Example

Prerequisites

Node pools

Node type selector

Built-in node type properties

Dynamic node pools

Disruption

Limits

Static node pools

Taints and node labels

Config reference

`autoNodes` required object[]

`provider` required string

`properties` required object

`static` required object[]

`name` required string

`nodeTypeSelector` required object[]

`property` required string

`operator` required string

`values` required string[]

`value` required string

`taints` required object[]

`key` required string

`value` required string

`effect` required string

`nodeLabels` required object

`terminationGracePeriod` required string

`quantity` required integer

`dynamic` required object[]

`name` required string

`nodeTypeSelector` required object[]

`property` required string

`operator` required string

`values` required string[]

`value` required string

`taints` required object[]

`key` required string

`value` required string

`effect` required string

`nodeLabels` required object

`limits` required object

`disruption` required object

`consolidateAfter` required string

`consolidationPolicy` required string

`budgets` required object[]

`nodes` required string

`schedule` required string

`duration` required string

`terminationGracePeriod` required string

`expireAfter` required string

`weight` required integer