Skip to main content

Cost Control Dashboard

The platform comes with the cost control dashboard enabled by default, offering insights into potential cost savings through virtual clusters.

info
This feature is available from version v4.2.0 and was introduced in vCluster version v0.22.0

To track allocations and calculate savings for workloads running inside virtual clusters the platform deploys and manages Prometheus and OpenCost on each connected cluster. Prometheus uses a time series database that requires persistent volume storage to retain metrics over time. The platform-managed Prometheus is configured to collect the minimum metrics required for the cost control dashboard. OpenCost provides CPU and Memory allocation metrics for the deployed workloads.

In addition to the Prometheus instance on the connected cluster, the cluster hosting the platform has a global Prometheus installed that is configured to aggregate metrics from all connected clusters. This utilizes the connected cluster private network so no additional networking configuration is required.

Storage requirements

By default, the Cost Control Dashboard requests 60Gi persistent volumes for each connected cluster's metrics, and an additional 60Gi persistent volume for global metric aggregation. See the Configuration section for details on enabling/disabling the dashboard or configuring its resource needs.

Network requirements

In air-gapped or private GKE clusters it may be necessary to add a firewall rule allowing incoming traffic to port 9090. This port is used when aggregating metrics from connected clusters.

Configuration​

Although the platform manages Prometheus and OpenCost automatically, certain settings can be configured to better suit specific environments or needs. The primary configuration is part of the platform constControl config.

To turn off the cost control dashboard, which is enabled by default, set costControl.enabled to false:

Turning off the Cost Control Dashboard
config:
costControl:
enabled: false

The remaining configuration customizes the cost settings, global Prometheus, and connected cluster OpenCost and Prometheus. These are detailed in the following sections.

Cost settings​

The platform provides initial cost defaults as shown below. These can be customized for your environment.

Default Cost Settings
config:
costControl:
settings:
# Average CPU cost
averageCPUPricePerNode:
price: 31
timePeriod: Monthly
# Average Memory cost
averageRAMPricePerNode:
price: 31
timePeriod: Monthly
# Monthly or Yearly cost for Hosted Control Planes
controlPlanePricePerCluster:
price: 900
timePeriod: Yearly

Global observability​

These settings control, the global Prometheus's availability, retention, storage, cpu, and memory configuration. The default settings are provided below for reference, and may be customized.

Default Global Prometheus Settings
config:
costControl:
global:
metrics:
# Number of replicas for the Prometheus Deployment
replicas: 1
# Prometheus TSDB retention settings: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
retention: 1y
# CPU and Memory Requests and Limits for the Prometheus Deployment
resources:
requests:
cpu: 500m
memory: 6Gi
limits:
cpu: '2'
memory: 6Gi
# Storage Settings for Persistent Volumes
storage:
size: 60Gi
# Valid values:
# - null: use the default storage class
# - "": disable automatic storage provisioning
# - name of the preferred configured storage class
storageClass: null

retention refers to Prometheus's TSDB retention settings. Details regarding valid units can be found in the prometheus documentation

storageClass configures the storage class that should be used for provisioning of the Prometheus deployment's StatefulSet. Valid values are:

  • null: use the cluster's default storage class
  • "" (empty string): turn off automatic provisioning
  • The name of a pre-configured storage class on the cluster
Overriding Resources

Overridden resources are not merged with the defaults to avoid unexpected changes from Helm charts used by the platform to manage Prometheus. Therefore, all requests and limits that you wish to apply should be provided.

Cluster observability​

These settings control the Prometheus's configuration for each connected cluster. These are applied uniformly across multiple clusters, but can be customized using Cluster Specific Configuration.

Default Cluster Prometheus Settings
config:
costControl:
cluster:
metrics:
# Number of replicas for the Prometheus Deployment
replicas: 1
# Prometheus TSDB retention settings: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
retention: 1y
# CPU and Memory Requests and Limits for the Prometheus Deployment
resources:
requests:
cpu: 500m
memory: 3Gi
limits:
cpu: '1'
memory: 3Gi
# Storage Settings for Persistent Volumes
storage:
size: 60Gi
# Valid values:
# - null: use the default storage class
# - "": disable automatic storage provisioning
# - name of the preferred configured storage class
storageClass: null
Overriding Resources

Overridden resources are not merged with the defaults to avoid unexpected changes from Helm charts used by the platform to manage Prometheus. Therefore, all requests and limits that you wish to apply should be provided.

Cluster cost control​

These settings control the OpenCost deployment on each connected cluster. While there are many possible ways to configure OpenCost, the Platform configures it with the minimum settings required for the dashboard. As such only availability, CPU, and memory settings may be configured.

Default OpenCost Settings
config:
costControl:
cluster:
opencost:
# Number of replicas for the OpenCost Deployment
replicas: 1
# CPU and Memory Requests and Limits for the OpenCost Deployment
resources:
requests:
cpu: 500m
memory: 3Gi
limits:
cpu: '1'
memory: 3Gi
Overriding Resources

Overridden resources are not merged with the defaults to avoid unexpected changes from Helm charts used by the platform to manage OpenCost. Therefore, all requests and limits that you wish to apply should be provided.

Other cluster settings​

For individual cluster configuration, the Cluster resource allows for further configuration specific to each cluster.