Skip to main content
Version: main 🚧

Fleet monitoring with OpenTelemetry

When managing many virtual clusters across host clusters, you need per-tenant visibility into resource consumption and API health without deploying monitoring inside each virtual cluster. This guide explains how to configure OpenTelemetry Collectors to collect workload and control plane metrics from across multiple virtual clusters. All metrics are enriched with vCluster identity labels at ingest time and pushed to a central Prometheus via remote_write.

note

For a simpler setup using the built-in OpenTelemetry DaemonSet App, see Aggregating Metrics. This guide covers advanced fleet monitoring with remote_write, Target Allocator, and per-tenancy-model collectors.

This architecture supports both the Shared Nodes and Private Nodes tenancy models. Each model uses a different collector configuration deployed as a vCluster Platform App.

warning

This guide isn't a production-ready monitoring solution that you can copy directly to your infrastructure. Observability is highly specialized to the underlying architecture. The goal is to lay out general capabilities and show what's possible along with a stripped-down example architecture. Apply these patterns with modifications to your actual use cases.

Architecture​

The architecture comprises the following:

  • Cluster architecture:

    • A local cluster that hosts vCluster Platform.
    • Two virtual clusters running on the local cluster:
      • One virtual cluster sharing the nodes of the local cluster (Shared Nodes tenancy model).
      • One virtual cluster with private nodes (Private Nodes tenancy model).
    • An external cluster connected to vCluster Platform.
    • Two virtual clusters running on the connected cluster with the same configuration.
  • Collector architecture:

    • A central Prometheus with the remote write receiver enabled.
    • One OTel Collector Deployment with Target Allocator per host cluster (scrapes shared-nodes virtual clusters and their control planes via ServiceMonitors).
    • One OTel Collector DaemonSet per private-nodes vCluster (scrapes local kubelet, cAdvisor, and API server metrics from inside the vCluster).

How it works​

Shared nodes​

The shared-nodes collector runs on the host cluster as a Deployment with 2 replicas. A Target Allocator discovers vCluster ServiceMonitors and distributes cAdvisor and ServiceMonitor scrape targets across replicas using consistent-hashing.

Metrics pipeline:

prometheus receiver
→ memory_limiter
→ groupbyattrs (split cAdvisor batch into per-pod resource scopes)
→ transform/pre_enrich (copy namespace/pod/node to k8s.* resource attributes)
→ k8sattributes (resolve pod/namespace metadata, add vCluster labels)
→ filter/vcluster_only (drop metrics without vCluster identity)
→ resource/add_cluster (add cluster label from Platform variable)
→ transform (copy resource attributes to datapoint attributes)
→ batch
→ prometheusremotewrite

The groupbyattrs processor is required because the Prometheus receiver batches all cAdvisor metrics from a single node into one resource scope. Without it, the k8sattributes processor matches one pod and applies its metadata to all metrics in the batch, causing cross-contamination between virtual clusters on the same node. The groupbyattrs processor splits the batch into per-pod resource scopes (by namespace, pod, node) so each pod is matched correctly.

The k8sattributes processor resolves vCluster identity from Platform-managed namespace labels (loft.sh/project, loft.sh/vcluster-instance-name, etc.) and pod labels/annotations set by the vCluster syncer (vcluster.loft.sh/namespace, vcluster.loft.sh/name).

The filter/vcluster_only processor drops any metrics where vcluster.name is nil after enrichment. This means only vCluster workload metrics pass through, which also prevents duplicate series with any existing Prometheus scrapes.

Private nodes​

The private-nodes collector runs inside each vCluster as a DaemonSet with one pod per node. Each pod scrapes only its local node's kubelet /metrics, cAdvisor /metrics/cadvisor, and API server /metrics endpoints.

Metrics pipeline:

prometheus receiver (kubelet, cAdvisor, API server)
→ memory_limiter
→ k8sattributes (resolve pod metadata)
→ transform (copy resource attributes to datapoint attributes)
→ batch
→ prometheusremotewrite
+ external_labels (cluster, vcluster_name, project, user)
+ metric_relabel (namespace → vcluster_virtual_namespace,
pod → vcluster_virtual_pod)

Since the collector runs inside the vCluster, it can't access host-cluster namespace labels. Instead, the Platform injects {{ .Values.loft.* }} template variables at deploy time, which are set as external_labels on the prometheusremotewrite exporter. These are static per-vCluster values applied to all exported metrics.

The metric_relabel_configs copy namespace to vcluster_virtual_namespace and pod to vcluster_virtual_pod. Inside a private-nodes vCluster, the namespace and pod labels already represent virtual names, so this copy ensures dashboard compatibility with the shared-nodes collector.

Metric labels​

All metrics from both apps carry a consistent set of identity labels:

LabelShared nodes sourcePrivate nodes source
clusterresource/add_cluster processor using {{ .Values.loft.cluster }}external_labels using {{ .Values.loft.cluster }}
vcluster_namek8sattributes from namespace label loft.sh/vcluster-instance-nameexternal_labels using {{ .Values.loft.name }}
vcluster_projectk8sattributes from namespace label loft.sh/projectexternal_labels using {{ .Values.loft.project }}
vcluster_userk8sattributes from namespace label loft.sh/userexternal_labels using {{ .Values.loft.user.name }}
vcluster_project_namespacek8sattributes from namespace label loft.sh/vcluster-instance-namespaceexternal_labels using {{ .Values.loft.space }}
vcluster_virtual_namespacek8sattributes from pod label vcluster.loft.sh/namespacemetric_relabel_configs copying namespace label
vcluster_virtual_podk8sattributes from pod annotation vcluster.loft.sh/namemetric_relabel_configs copying pod label
info

vcluster_virtual_namespace and vcluster_virtual_pod are missing on some metrics. These are vCluster system pods (syncer, CoreDNS) that don't have the syncer labels and annotations because they aren't user workloads synced from inside the vCluster.

Prerequisites​

The central Prometheus must be configured as a remote write receiver. The following Helm values enable this:

server:
extraFlags:
- web.enable-remote-write-receiver

Shared nodes prerequisites​

  • Prometheus Operator CRDs installed on the host cluster (ServiceMonitor, PodMonitor).

  • Virtual clusters deployed with a ServiceMonitor enabled. This allows scraping their API server and controller metrics. Enable this in your vcluster.yaml:

    controlPlane:
    serviceMonitor:
    enabled: true
  • Kubelet scraping disabled in any existing kube-prometheus-stack to avoid duplicate cAdvisor series (kubelet.enabled: false).

  • Platform namespace labels present (added automatically by the vCluster Platform).

Private nodes prerequisites​

  • Virtual clusters with dedicated/private nodes.

  • Node-to-node vCluster VPN enabled:

    privateNodes:
    enabled: true
    vpn:
    enabled: true
    nodeToNode:
    enabled: true

Deploy the shared nodes collector​

Deploy one shared-nodes collector per host cluster. First, register the App manifest with the Platform, then deploy it to each cluster through the UI.

App manifest​

The shared-nodes app deploys the opentelemetry-kube-stack Helm chart (v0.14.4) with the following configuration:

otel-collector-shared-nodes-app.yaml
otel-collector-shared-nodes-app.yaml
apiVersion: management.loft.sh/v1
kind: App
metadata:
name: otel-collector-shared-nodes
spec:
access:
- users:
- '*'
verbs:
- get
config:
chart:
name: opentelemetry-kube-stack
repoURL: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.14.4
values: | # yaml
---
clusterName: "{{ .Values.loft.cluster }}"
crds:
installPrometheus: false
opentelemetry-operator:
enabled: true
manager:
collectorImage:
repository: otel/opentelemetry-collector-contrib
featureGatesMap:
operator.targetallocator.mtls: true
admissionWebhooks:
certManager:
enabled: false
autoGenerateCert:
enabled: true
recreate: true
# Post-install job: works around an OpenTelemetry Operator bug where DELETE
# validation webhooks block app uninstallation. This block can be removed
# once an upstream fix is released.
extraObjects:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: patch-webhook-sa
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector-shared-nodes-patch-webhook
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rules:
- apiGroups: ["admissionregistration.k8s.io"]
resources: ["validatingwebhookconfigurations"]
verbs: ["get", "patch"]
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-collector-shared-nodes-patch-webhook
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
subjects:
- kind: ServiceAccount
name: patch-webhook-sa
namespace: otel
roleRef:
kind: ClusterRole
name: otel-collector-shared-nodes-patch-webhook
apiGroup: rbac.authorization.k8s.io
- apiVersion: batch/v1
kind: Job
metadata:
name: patch-webhook
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
serviceAccountName: patch-webhook-sa
containers:
- name: patch
image: "bitnami/kubectl:latest"
command: ["bash", "-c"]
args:
- |
WH="otel-collector-shared-nodes-opentelemetry-operator-validation"
for i in $(seq 1 30); do kubectl get validatingwebhookconfiguration "$WH" >/dev/null 2>&1 && break; sleep 2; done
# Build JSON patch to remove webhooks with "delete" in name (reverse order to preserve indices)
PATCH=$(kubectl get validatingwebhookconfiguration "$WH" -o jsonpath='{range .webhooks[*]}{.name}{"\n"}{end}' \
| awk '/delete/{print NR-1}' | sort -rn \
| awk 'BEGIN{printf "["} NR>1{printf ","} {printf "{\"op\":\"remove\",\"path\":\"/webhooks/%d\"}",$1} END{printf "]"}')
[ "$PATCH" = "[]" ] && exit 0
echo "Removing DELETE webhooks: $PATCH"
kubectl patch validatingwebhookconfiguration "$WH" --type=json -p="$PATCH"
collectors:
# Disable the default DaemonSet collector
daemon:
enabled: false
# Deployment-mode collector with Target Allocator
cluster:
enabled: true
suffix: cluster
mode: deployment
replicas: 2
resources:
limits:
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 5
presets:
kubernetesAttributes:
enabled: true
targetAllocator:
enabled: true
allocationStrategy: consistent-hashing
prometheusCR:
enabled: true
serviceMonitorSelector:
matchLabels:
app: vcluster
podMonitorSelector: {}
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kubelet-cadvisor'
scrape_interval: 60s
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
metrics_path: /metrics/cadvisor
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
replacement: '$$1:10250'
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
target_label: node
processors:
groupbyattrs:
keys:
- namespace
- pod
- node
transform/pre_enrich:
error_mode: ignore
metric_statements:
- context: resource
statements:
- 'set(attributes["k8s.namespace.name"], attributes["namespace"]) where attributes["namespace"] != nil'
- 'set(attributes["k8s.pod.name"], attributes["pod"]) where attributes["pod"] != nil'
- 'set(attributes["k8s.node.name"], attributes["node"]) where attributes["node"] != nil'
k8sattributes:
auth_type: serviceAccount
passthrough: false
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.start_time
- k8s.pod.uid
- k8s.deployment.name
- k8s.node.name
- k8s.container.name
labels:
# Pod labels - vcluster syncer adds these to synced pods
- tag_name: vcluster.virtual.namespace
key: vcluster.loft.sh/namespace
from: pod
# Namespace labels - platform adds these to vcluster namespaces
- tag_name: vcluster.project
key: loft.sh/project
from: namespace
- tag_name: vcluster.project.namespace
key: loft.sh/vcluster-instance-namespace
from: namespace
- tag_name: vcluster.user
key: loft.sh/user
from: namespace
- tag_name: vcluster.name
key: loft.sh/vcluster-instance-name
from: namespace
annotations:
# Pod annotations - identifies the virtual pod name
- tag_name: vcluster.virtual.pod
key: vcluster.loft.sh/name
from: pod
transform:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- 'set(attributes["k8s.node.name"], resource.attributes["k8s.node.name"])'
- 'set(attributes["k8s.pod.name"], resource.attributes["k8s.pod.name"])'
- 'set(attributes["k8s.namespace.name"], resource.attributes["k8s.namespace.name"])'
- 'set(attributes["vcluster.virtual.pod"], resource.attributes["vcluster.virtual.pod"])'
- 'set(attributes["vcluster.virtual.namespace"], resource.attributes["vcluster.virtual.namespace"])'
- 'set(attributes["vcluster.project"], resource.attributes["vcluster.project"])'
- 'set(attributes["vcluster.project.namespace"], resource.attributes["vcluster.project.namespace"])'
- 'set(attributes["vcluster.user"], resource.attributes["vcluster.user"])'
- 'set(attributes["vcluster.name"], resource.attributes["vcluster.name"])'
filter/vcluster_only:
metrics:
datapoint:
- 'resource.attributes["vcluster.name"] == nil'
resource/add_cluster:
attributes:
- action: upsert
key: cluster
value: "{{ .Values.loft.cluster }}"
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 15
batch:
send_batch_size: 10000
send_batch_max_size: 10000
timeout: 10s
exporters:
prometheusremotewrite:
endpoint: '{{ .Values.prometheus.endpoint }}/api/v1/write'
{{- if and .Values.prometheus.username .Values.prometheus.password }}
auth:
authenticator: basicauth/prw
{{- end }}
tls:
insecure_skip_verify: {{ .Values.prometheus.insecure }}
resource_to_telemetry_conversion:
enabled: true
extensions:
health_check:
endpoint: 0.0.0.0:13133
{{- if and .Values.prometheus.username .Values.prometheus.password }}
basicauth/prw:
client_auth:
username: "{{ .Values.prometheus.username }}"
password: "{{ .Values.prometheus.password }}"
{{- end }}
service:
extensions:
- health_check
{{- if and .Values.prometheus.username .Values.prometheus.password }}
- basicauth/prw
{{- end }}
pipelines:
metrics:
receivers:
- prometheus
processors:
- memory_limiter
- groupbyattrs
- transform/pre_enrich
- k8sattributes
- filter/vcluster_only
- resource/add_cluster
- transform
- batch
exporters:
- prometheusremotewrite
defaultNamespace: monitoring
displayName: OTEL Collector - Shared Nodes
icon: https://opentelemetry.io/img/logos/opentelemetry-logo-nav.png
parameters:
- description: The Prometheus remote write endpoint (without /api/v1/write suffix)
label: Prometheus Endpoint
required: true
variable: prometheus.endpoint
- description: Username for basic auth (optional)
label: Prometheus Username
variable: prometheus.username
- description: Password for basic auth (optional)
label: Prometheus Password
type: password
variable: prometheus.password
- description: Skip TLS verification for the connection to Prometheus
label: Prometheus Skip TLS Verification
type: boolean
variable: prometheus.insecure
recommendedApp:
- cluster
Key configuration details
  • Deployment mode with Target Allocator: A Deployment with 2 replicas and consistent-hashing allocation is more resource-efficient than a DaemonSet. Since the prometheus receiver is used (not kubeletstats), there's no need for local-node scraping.
  • serviceMonitorSelector: app: vcluster: Without filtering, the Target Allocator discovers all ServiceMonitors in the cluster, overwhelming collectors with memory pressure.
  • operator.targetallocator.mtls: true: Each vCluster exposes its API server metrics over mTLS. Without this feature gate, the Target Allocator redacts TLS private keys when passing scrape configs to collectors.
  • otel/opentelemetry-collector-contrib image: The default image doesn't include the prometheusremotewrite exporter.

Register the app​

Apply the App manifest to the management API so that it becomes available in the Platform UI:

kubectl apply -f otel-collector-shared-nodes-app.yaml

Deploy to a cluster​

  1. Go to the Infra section using the menu on the left, and select the Clusters view.

  2. Click on the cluster where you want to deploy the collector.

  3. Navigate to the Apps tab.

  4. Click and select the OTEL Collector - Shared Nodes app.

  5. Configure the following parameters and click .

ParameterRequiredDescription
Prometheus EndpointYesRemote write URL (without /api/v1/write suffix)
Prometheus UsernameNoBasic auth username
Prometheus PasswordNoBasic auth password
Prometheus Skip TLS VerificationNoSkip TLS verification for the Prometheus connection
info

Repeat these steps for each host cluster.

Deploy the private nodes collector​

Deploy one private-nodes collector into each private-nodes vCluster. All vCluster identity labels are injected automatically by the Platform via {{ .Values.loft.* }}.

App manifest​

The private-nodes app deploys the opentelemetry-collector Helm chart (v0.144.0) with the following configuration:

otel-collector-private-nodes-app.yaml
otel-collector-private-nodes-app.yaml
apiVersion: management.loft.sh/v1
kind: App
metadata:
name: otel-collector-private-nodes
spec:
access:
- users:
- '*'
verbs:
- get
config:
chart:
name: opentelemetry-collector
repoURL: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.144.0
values: | # yaml
---
mode: daemonset
image:
repository: otel/opentelemetry-collector-contrib
presets:
kubeletMetrics:
enabled: false
kubernetesAttributes:
enabled: true
service:
enabled: true
# Explicitly inject node name for local-only scraping
extraEnvs:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
clusterRole:
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/metrics", "nodes/proxy", "services", "endpoints", "pods", "namespaces"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["nodes/stats"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
config:
{{- if and .Values.prometheus.username .Values.prometheus.password }}
extensions:
basicauth/prw:
client_auth:
username: "{{ .Values.prometheus.username }}"
password: "{{ .Values.prometheus.password }}"
{{- end }}
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kubelet'
scrape_interval: 60s
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: {{ .Values.prometheus.insecure }}
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
# Only scrape the node this pod is running on
- source_labels: [__meta_kubernetes_node_name]
regex: '${env:K8S_NODE_NAME}'
action: keep
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
replacement: '$$1:10250'
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
target_label: node
metric_relabel_configs:
- source_labels: [namespace]
target_label: vcluster_virtual_namespace
- source_labels: [pod]
target_label: vcluster_virtual_pod

- job_name: 'kubelet-cadvisor'
scrape_interval: 60s
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: {{ .Values.prometheus.insecure }}
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
metrics_path: /metrics/cadvisor
relabel_configs:
# Only scrape the node this pod is running on
- source_labels: [__meta_kubernetes_node_name]
regex: '${env:K8S_NODE_NAME}'
action: keep
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
replacement: '$$1:10250'
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
target_label: node
metric_relabel_configs:
- source_labels: [namespace]
target_label: vcluster_virtual_namespace
- source_labels: [pod]
target_label: vcluster_virtual_pod

- job_name: 'apiserver'
scrape_interval: 60s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: ['default']
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: {{ .Values.prometheus.insecure }}
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kubernetes;https
processors:
k8sattributes:
auth_type: 'serviceAccount'
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.start_time
- k8s.pod.uid
- k8s.deployment.name
- k8s.node.name
transform:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- 'set(attributes["k8s.node.name"], resource.attributes["k8s.node.name"])'
- 'set(attributes["k8s.pod.name"], resource.attributes["k8s.pod.name"])'
- 'set(attributes["k8s.namespace.name"], resource.attributes["k8s.namespace.name"])'
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 15
batch:
send_batch_size: 10000
timeout: 10s
exporters:
prometheusremotewrite:
endpoint: '{{ .Values.prometheus.endpoint }}/api/v1/write'
{{- if and .Values.prometheus.username .Values.prometheus.password }}
auth:
authenticator: basicauth/prw
{{- end }}
tls:
insecure_skip_verify: {{ .Values.prometheus.insecure }}
# Use external_labels for loft labels (simpler than attributes processor for remote write)
external_labels:
cluster: "{{ .Values.loft.cluster }}"
vcluster_name: "{{ .Values.loft.name }}"
vcluster_project: "{{ .Values.loft.project }}"
vcluster_project_namespace: "{{ .Values.loft.space }}"
vcluster_user: "{{ .Values.loft.user.name }}"
# Disable resource to telemetry conversion to avoid duplicate labels
resource_to_telemetry_conversion:
enabled: false
service:
extensions:
- health_check
{{- if and .Values.prometheus.username .Values.prometheus.password }}
- basicauth/prw
{{- end }}
pipelines:
metrics:
receivers:
- prometheus
processors:
- memory_limiter
- k8sattributes
- transform
- batch
exporters:
- prometheusremotewrite
defaultNamespace: monitoring
description: |
OpenTelemetry Collector for private/dedicated node vCluster monitoring.
Deploys a DaemonSet-mode collector inside the vCluster, scraping kubelet,
cAdvisor, and API server metrics from each node.
displayName: OTEL Collector - Private Nodes
icon: https://opentelemetry.io/img/logos/opentelemetry-logo-nav.png
parameters:
- description: The Prometheus endpoint to push metrics to
label: Prometheus Endpoint
required: true
variable: prometheus.endpoint
- description: The Prometheus username
label: Prometheus Username
variable: prometheus.username
- description: The password to access Prometheus
label: Prometheus Password
type: password
variable: prometheus.password
- description: Skip TLS verification for the connection to Prometheus
label: Prometheus Skip TLS Verification
type: boolean
variable: prometheus.insecure
recommendedApp:
- virtualcluster
Key configuration details
  • DaemonSet mode: Private-nodes virtual clusters have dedicated nodes. A DaemonSet ensures one collector per node, scraping only the local kubelet and cAdvisor via ${env:K8S_NODE_NAME} filtering. No Target Allocator is needed.
  • external_labels instead of k8sattributes for identity: The collector runs inside the vCluster, so it can't access host-cluster namespace labels. The Platform injects {{ .Values.loft.* }} template variables at deploy time.
  • resource_to_telemetry_conversion: false: Setting this to true causes duplicate labels that break Grafana dashboards.
  • metric_relabel_configs for virtual namespace/pod: Since resource_to_telemetry_conversion is disabled, OTel transform processor attributes don't reach the exported Prometheus labels. The relabel configs copy namespace and pod to vcluster_virtual_namespace and vcluster_virtual_pod at scrape time.

Register the app​

Apply the App manifest to the management API so that it becomes available in the Platform UI:

kubectl apply -f otel-collector-private-nodes-app.yaml

Deploy to a virtual cluster​

  1. Go to the Projects section using the menu on the left.

  2. Select the project containing your private-nodes virtual cluster.

  3. Click on the virtual cluster, then navigate to the Config tab.

  4. Scroll down to the Apps & Objects section and add the OTEL Collector - Private Nodes app.

  5. Configure the following parameters and click .

ParameterRequiredDescription
Prometheus EndpointYesRemote write URL (without /api/v1/write suffix)
Prometheus UsernameNoBasic auth username
Prometheus PasswordNoBasic auth password
Prometheus Skip TLS VerificationNoSkip TLS verification for the Prometheus connection
info

Repeat these steps for each private-nodes virtual cluster.

Golden signals queries​

With the collectors deployed and forwarding metrics to the central Prometheus, you can query the aggregated data. This section provides PromQL queries organized around the Four Golden Signals of monitoring: latency, traffic, errors, and saturation.

Because identity labels are enriched at ingest time, every query can filter and aggregate by cluster, vcluster_project, and vcluster_name directly.

Latency​

kube-apiserver request latency (p99, by verb)​

histogram_quantile(0.99,
sum by (le, verb, cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_duration_seconds_bucket{vcluster_name!=""}[5m])
)
)

Why: Shows the tail latency of API server requests broken down by operation type (GET, LIST, PUT, POST, PATCH, DELETE, WATCH). The p99 captures outliers that averages hide. WATCH is expected to show 60s (long-poll).

kube-apiserver request latency (p95, non-WATCH)​

histogram_quantile(0.95,
sum by (le, verb, cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_duration_seconds_bucket{verb!~"WATCH|CONNECT", vcluster_name!=""}[5m])
)
)

Why: Excludes long-running connections to focus on latency for synchronous API calls.

etcd backend latency (p99, by operation)​

histogram_quantile(0.99,
sum by (le, operation, cluster, vcluster_project, vcluster_name) (
rate(etcd_request_duration_seconds_bucket{vcluster_name!=""}[5m])
)
)

Why: etcd is the persistence backend. High latencies here (especially for get and list) propagate to every API call.

Traffic​

kube-apiserver request rate (by verb)​

sum by (verb, cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_total{vcluster_name!=""}[5m])
)

Why: The most fundamental measure of cluster workload. Shows how many requests per second the API server handles, broken down by verb.

kube-apiserver request rate (by resource)​

topk(10,
sum by (resource, cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_total{vcluster_name!=""}[5m])
)
)

Why: Identifies which Kubernetes resources generate the most API traffic, revealing "hot" resource types.

Network I/O rate (by virtual namespace)​

topk(10,
sum by (vcluster_name, vcluster_virtual_namespace) (
rate(container_network_receive_bytes_total{vcluster_name!="", vcluster_virtual_namespace!=""}[5m])
)
)
topk(10,
sum by (vcluster_name, vcluster_virtual_namespace) (
rate(container_network_transmit_bytes_total{vcluster_name!="", vcluster_virtual_namespace!=""}[5m])
)
)

Why: Measures network throughput per namespace, revealing which workloads generate the most network traffic.

REST client outbound request rate (by code)​

sum by (code, cluster, vcluster_project, vcluster_name) (
rate(rest_client_requests_total{vcluster_name!=""}[5m])
)

Why: How many outbound API calls the control-plane components make.

Errors​

kube-apiserver error rate (4xx/5xx, by code)​

sum by (code, cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_total{code=~"[45]..", vcluster_name!=""}[5m])
)

Why: HTTP-level error rates broken down by status code.

kube-apiserver error ratio (errors / total)​

sum by (cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_total{code=~"5..", vcluster_name!=""}[5m])
)
/
sum by (cluster, vcluster_project, vcluster_name) (
rate(apiserver_request_total{vcluster_name!=""}[5m])
)

Why: The fraction of server-side errors. A ratio above 1% is a red flag.

etcd request errors​

sum by (operation, cluster, vcluster_project, vcluster_name) (
rate(etcd_request_errors_total{vcluster_name!=""}[5m])
)

Why: Backend storage errors directly impact cluster health.

Container OOM kills​

sum by (vcluster_name, vcluster_virtual_namespace, vcluster_virtual_pod) (
rate(container_oom_events_total{vcluster_name!=""}[5m])
)

Why: Out-of-memory kills indicate resource misconfiguration.

REST client error rate (outbound 5xx)​

sum by (host, cluster, vcluster_project, vcluster_name) (
rate(rest_client_requests_total{code=~"5..", vcluster_name!=""}[5m])
)

Why: Errors when control-plane components call external APIs.

Saturation​

Container CPU usage (top pods)​

topk(10,
sum by (vcluster_name, vcluster_virtual_namespace, vcluster_virtual_pod) (
rate(container_cpu_usage_seconds_total{vcluster_name!="", container!="", vcluster_virtual_pod!=""}[5m])
)
)

Why: Shows the most CPU-hungry pods across the fleet.

Container memory working set (top pods)​

topk(10,
sum by (vcluster_name, vcluster_virtual_namespace, vcluster_virtual_pod) (
container_memory_working_set_bytes{vcluster_name!="", container!="", vcluster_virtual_pod!=""}
)
)

Why: Working set is the "real" memory usage that matters for OOM decisions.

CPU throttling ratio (by pod)​

topk(10,
sum by (vcluster_name, vcluster_virtual_namespace, vcluster_virtual_pod) (
rate(container_cpu_cfs_throttled_periods_total{vcluster_name!="", vcluster_virtual_pod!=""}[5m])
)
/
sum by (vcluster_name, vcluster_virtual_namespace, vcluster_virtual_pod) (
rate(container_cpu_cfs_periods_total{vcluster_name!="", vcluster_virtual_pod!=""}[5m])
)
)

Why: Shows which pods are being throttled by cgroup CPU limits.

kube-apiserver inflight requests​

apiserver_current_inflight_requests{vcluster_name!=""}

Why: Shows current request concurrency for mutating vs read-only. When this approaches flow control limits, requests start queuing.

kube-apiserver flow-control queue depth​

sum by (priority_level, cluster, vcluster_project, vcluster_name) (
apiserver_flowcontrol_current_inqueue_requests{vcluster_name!=""}
)

Why: Requests waiting in priority-level queues. Non-zero means the API server is saturated for that priority level.

Workqueue depth (by queue name)​

topk(10,
workqueue_depth{vcluster_name!=""}
)

Why: Controller work queues. Growing depth means controllers can't keep up with the event rate.

Grafana dashboards​

note

This section assumes Grafana is already deployed. For setup instructions, see Aggregating Metrics — Deploy Grafana.

Two Grafana dashboards are provided for visualizing metrics collected by the OTel Collectors. Both dashboards use the identity labels enriched at ingest time, so all panels use straightforward PromQL queries without joins.

Import a dashboard​

  1. Download the dashboard JSON file from the relevant section below.
  2. Open Grafana and navigate to Dashboards.
  3. Click New > Import.
  4. Upload the .json file or paste its contents.
  5. Select your Prometheus data source and click Import.

vCluster projects dashboard​

A platform admin overview of vCluster projects across shared and private node virtual clusters. Use this dashboard to monitor resource consumption and API health at the project level.

Dashboard JSON file: Download dashboard JSON

Template variables:

VariableDescription
datasourcePrometheus data source
clusterFilter by cluster (supports multi-select)
projectFilter by project (supports multi-select)

Panels:

SectionPanels
Project summaryTotal Projects, Total vClusters, Total Pods, Total CPU (cores), Total Memory, Max Error Rate (stat panels)
Resource usage by projectCPU by Project (cores), Memory by Project, Pods by Project, vClusters by Project (bar charts)
API health by projectAPI Request Rate, API Error Rate (5xx), API P95 Latency (time series)
Resource trendsCPU Usage Trend (stacked), Memory Usage Trend (stacked) (time series)

vCluster detail dashboard​

A detailed drill-down view for individual virtual clusters. Use this dashboard to investigate resource consumption and control plane health for a specific virtual cluster.

Dashboard JSON file: Download dashboard JSON

Template variables:

VariableDescription
datasourcePrometheus data source
clusterFilter by cluster
projectFilter by project
vclusterFilter by vCluster name

Panels:

SectionPanels
vCluster overviewPods, Namespaces, Total CPU (cores), Total Memory, API Request Rate, API Error Rate (stat panels)
API serverAPI Request Rate by Resource, API Latency P95 by Verb, API Errors by Status Code, In-Flight Requests, Long-Running Requests (CONNECT) (time series)
CPUCPU by Virtual Namespace (stacked), CPU by Pod (top 15), CPU Throttling by Pod (ratio), CPU by Node (time series)
MemoryMemory by Virtual Namespace (stacked), Memory by Pod (top 15), RSS Memory by Pod (top 15), OOM Events by Pod (time series)
NetworkNetwork Receive by Pod (top 10), Network Transmit by Pod (top 10), Network Packet Drops, Network Errors (time series)
Disk I/ODisk Read by Pod (top 10), Disk Write by Pod (top 10) (time series)
FilesystemFilesystem Usage vs Limit, Filesystem I/O Throughput (time series)