Pod Scheduling
Vcluster runs your workloads by replicating pods from the virtual cluster to the host cluster. We call this process synchronization or sync for short. This process is executed by the "syncer" component of the vCluster. To control how vCluster pods are scheduled on the host cluster, you may need to pass additional arguments to the syncer, or set certain helm chart values during vCluster installation and upgrade. Some of these options are described in the chapters below.
Separate vCluster Scheduler
By default, vCluster will reuse the scheduler of the host cluster to schedule workloads. This saves computing resources, but also has some limitations:
- Labeling nodes inside the virtual cluster has no effect on scheduling
- Draining or tainting nodes inside the virtual cluster has no effect on scheduling
- You cannot use custom schedulers inside the vCluster
Sometimes you want to label a node inside the vCluster to modify workload scheduling through features such as affinity or topology spreading. vCluster supports running a scheduler inside the virtual cluster instead of reusing the host cluster's scheduler. vCluster will then only sync pods that already have a node assigned to the host cluster.
You can enable the virtual scheduler via the values.yaml
of vCluster:
sync:
nodes:
enabled: true
enableScheduler: true
# Either syncAllNodes or nodeSelector is required
syncAllNodes: true
Then create or upgrade a vCluster with:
vcluster create my-vcluster -f values.yaml
Now you can taint and label nodes inside the virtual cluster without actually modifying the host cluster nodes.
If the persistentvolumeclaims
syncer is also enabled, relevant csistoragecapacity
,csinode
, and csidriver
objects will be mirrored to the virtual cluster so the scheduler can make storage-aware scheduling decisions.
Reuse Host Scheduler
If you don't want to use a separate scheduler inside the vCluster, you can also customize to a certain degree how the host scheduler will schedule your virtual cluster workloads.
Using priority classes
If you need to use priority classes, you can enable this by adding the following to your values.yaml
:
sync:
priorityclasses:
enabled: true
then create or upgrade the vCluster with:
vcluster create my-vcluster --upgrade -f values.yaml
This will pass the necessary flags to the "syncer" container and create or update the ClusterRole used by vCluster to include necessary permissions.
Limiting pod scheduling to selected nodes
Vcluster allows you to limit on which nodes the pods synced by vCluster will run.
You can achieve this by combining --node-selector
and --enforce-node-selector
syncer flags.
The --enforce-node-selector
flag is enabled by default.
When --enforce-node-selector
flag is disabled, and a --node-selector
is specified nodes will be synced based on the
selector, as well as nodes running pod workloads.
When using vCluster helm chart or CLI, there are two options for setting the --node-selector
flag.
This first option is recommended if you are not enabling node synchronization, and use the fake nodes, which are enabled by default. In such case, you would write a string representation of your node selector(e.g. "nodeLabel=labelValue") and set it as the value of --node-selector
argument for syncer in your values.yaml
:
syncer:
extraArgs:
- --node-selector=nodeLabel=labelValue
then create or upgrade the vCluster with:
vcluster create my-vcluster --upgrade -f values.yaml
This second option is recommended if you are enabling synchronization of the real nodes via helm values. This is how you would then set the selector in your values.yaml
:
sync:
nodes:
enabled: true
nodeSelector: "nodeLabel=labelValue"
then create or upgrade the vCluster with:
vcluster create my-vcluster --upgrade -f values.yaml
When sync of the real nodes is enabled and nodeSelector is set, all nodes that match the selector will be synced into vCluster. Read more about Node sync modes on the Nodes documentation page.
Automatically applying tolerations to all pods synced by vCluster
Kubernetes has a concept of Taints and Tolerations, which is used for controlling scheduling. If you have a use case requiring all pods synced by vCluster to have a toleration set automatically, then you can achieve this with the --enforce-toleration
syncer flag. You can pass multiple --enforce-toleration
flags with different toleration expressions, and syncer will add them to every new pod that gets synced by vCluster.
This is how toleration is set in yaml format:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
We will need to write this information in a string format to pass it as value of the --enforce-toleration
flag.
The example above would be represented as key1=value1:NoSchedule
.
The Exists operator is written as - key1:Effect
.
And if you need to write a toleration with empty effect, here is an example for that - key1=value1
, or just key
if value is also empty.
There is also a special case of a toleration that contains only operator Exists without effect or key, which would match every taint, and we express this as *
.
You can set the --enforce-toleration
flags as arguments for syncer in your values.yaml
:
syncer:
extraArgs:
- --enforce-toleration=key1=value1:NoSchedule
- --enforce-toleration=anotherKey1=value2:NoSchedule
vCluster does not support setting the tolerationSeconds
field of a toleration through the syntax that the --enforce-toleration
flag uses. If your use case requires this, please raise an issue in the vCluster repo on GitHub.