Skip to main content
Export real-time resource and execution metrics from Cerebrium applications to an existing observability platform. Monitor CPU, memory, GPU usage, request counts, and latency. Most major OTLP-compatible monitoring platforms are supported.

What metrics are exported?

Resource Metrics

MetricTypeUnitDescription
cerebrium_cpu_utilization_coresGaugecoresCPU cores actively in use per app
cerebrium_memory_usage_bytesGaugebytesMemory actively in use per app
cerebrium_gpu_memory_usage_bytesGaugebytesGPU VRAM in use per app
cerebrium_gpu_compute_utilization_percentGaugepercentGPU compute utilization (0-100) per app
cerebrium_containers_running_countGaugecountNumber of running containers per app
cerebrium_containers_ready_countGaugecountNumber of ready containers per app

Execution Metrics

MetricTypeUnitDescription
cerebrium_run_execution_time_msHistogrammsTime spent executing user code
cerebrium_run_queue_time_msHistogrammsTime spent waiting in queue
cerebrium_run_coldstart_time_msHistogrammsTime for container cold start
cerebrium_run_response_time_msHistogrammsTotal end-to-end response time
cerebrium_run_totalCounterTotal run count
cerebrium_run_successes_totalCounterSuccessful run count
cerebrium_run_errors_totalCounterFailed run count
Prometheus metric name mapping: When metrics are ingested by Prometheus (including Grafana Cloud), OTLP automatically appends unit suffixes to metric names. Histogram metrics will appear with _milliseconds appended — for example, cerebrium_run_execution_time_ms becomes cerebrium_run_execution_time_ms_milliseconds_bucket, _count, and _sum. Counter metrics with the _total suffix remain unchanged. The example queries throughout this guide use the Prometheus-ingested names.

Labels

Every metric includes the following labels for filtering and grouping:
LabelDescriptionExample
project_idYour Cerebrium project IDp-abc12345
app_idFull application identifierp-abc12345-my-model
app_nameHuman-readable app namemy-model
regionDeployment regionus-east-1

How it works

Cerebrium automatically pushes metrics to the configured monitoring platform every 60 seconds using the OpenTelemetry Protocol (OTLP). Provide an OTLP endpoint and authentication credentials through the Cerebrium dashboard — Cerebrium handles collecting resource usage and execution data, formatting it as OpenTelemetry metrics, and delivering it to the destination.
  • Metrics are pushed every 60 seconds
  • Failed pushes are retried 3 times with exponential backoff
  • If pushes fail 10 consecutive times, export is automatically paused to avoid noise (re-enable at any time from the dashboard)
  • Credentials are stored encrypted and never returned in API responses

Supported destinations

  • Grafana Cloud — Primary supported destination
  • Datadog — Via OTLP endpoint
  • Prometheus — Self-hosted with OTLP receiver enabled
  • Custom — Any OTLP-compatible endpoint (New Relic, Honeycomb, etc.)

Setup Guide

Step 1: Get your platform credentials

Gather an OTLP endpoint and authentication credentials from the monitoring platform before configuring the Cerebrium dashboard.
  1. Sign in to Grafana Cloud
  2. Go to your stack → ConnectionsAdd new connection
  3. Search for “OpenTelemetry” and click Configure
  4. Copy the OTLP endpoint — this will match your stack’s region:
    • US: https://otlp-gateway-prod-us-east-0.grafana.net/otlp
    • EU: https://otlp-gateway-prod-eu-west-0.grafana.net/otlp
    • Other regions will show their specific URL on the configuration page
  5. On the same page, generate an API token. Click Generate now and ensure the token has the MetricsPublisher role — this is a separate token from any Prometheus Remote Write tokens you may already have.
  6. The page will show you an Instance ID and the generated token. Run the following in your terminal to create the Basic auth string:
echo -n "INSTANCE_ID:TOKEN" | base64
Copy the output — you’ll paste it in the dashboard in the next step.
The API token must have the MetricsPublisher role. The default Prometheus Remote Write token will not work with the OTLP endpoint. If you’re unsure, generate a new token from the OpenTelemetry configuration page — it will have the correct role by default.

Step 2: Configure in the Cerebrium dashboard

  1. In the Cerebrium dashboard, go to your project → IntegrationsMetrics Export
  2. Paste your OTLP endpoint from Step 1
  3. Add the authentication headers from Step 1:
  • Header name: Authorization - Header value: Basic YOUR_BASE64_STRING (the output from the terminal command in Step 1)
  1. Click Save & Enable
Metrics start flowing within 60 seconds. The dashboard shows a green “Connected” status with the time of the last successful export. If something looks wrong, click Test Connection to verify Cerebrium can reach the monitoring platform. The result includes details to help troubleshoot.

Viewing Metrics

Once connected, metrics appear in the monitoring platform within a minute or two (exact latency depends on the platform’s ingestion pipeline).
  1. Go to your Grafana Cloud dashboard → Explore
  2. Select your Prometheus data source — it will be named something like grafanacloud-yourstack-prom (find it under ConnectionsData sources if you’re unsure)
  3. Search for metrics starting with cerebrium_
Example queries:
Histogram metrics in Prometheus have _milliseconds appended by OTLP’s unit suffix convention, so you’ll see names like cerebrium_run_execution_time_ms_milliseconds_bucket. This is expected behavior — see the metric name mapping note above.
# CPU usage by app
cerebrium_cpu_utilization_cores{project_id="YOUR_PROJECT_ID"}

# Memory for a specific app
cerebrium_memory_usage_bytes{app_name="my-model"}

# Container scaling over time
cerebrium_containers_running_count{project_id="YOUR_PROJECT_ID"}

# Request rate (requests per second over 5 minutes)
rate(cerebrium_run_total[5m])

# p99 execution latency
histogram_quantile(0.99, rate(cerebrium_run_execution_time_ms_milliseconds_bucket{app_name="my-model"}[5m]))

# p99 end-to-end response time
histogram_quantile(0.99, rate(cerebrium_run_response_time_ms_milliseconds_bucket{app_name="my-model"}[5m]))

# Error rate as a percentage
rate(cerebrium_run_errors_total{app_name="my-model"}[5m]) / rate(cerebrium_run_total{app_name="my-model"}[5m]) * 100

# Average cold start time
rate(cerebrium_run_coldstart_time_ms_milliseconds_sum{app_name="my-model"}[5m]) / rate(cerebrium_run_coldstart_time_ms_milliseconds_count{app_name="my-model"}[5m])

Managing Metrics Export

Manage metrics export configuration from the dashboard at any time under IntegrationsMetrics Export.
  • Disable export: Toggle the switch off. The configuration is preserved — re-enable at any time without reconfiguring.
  • Update credentials: Enter new authentication headers and click Save Changes. Use this when rotating API keys.
  • Change endpoint: Update the OTLP endpoint field and click Save Changes.
  • Check status: The dashboard shows whether export is connected, the time of the last successful export, and any error messages.

Troubleshooting

Metrics not appearing

  1. Check the dashboard status. Go to IntegrationsMetrics Export and look for the connection status. If it shows “Paused,” export was automatically disabled after repeated failures — click Re-enable after fixing the issue.
  2. Run a connection test. Click Test Connection on the dashboard. Common errors:
    • 401 / 403 Unauthorized: Your auth headers are wrong. For Grafana Cloud, make sure you’re using a MetricsPublisher token (not a Prometheus Remote Write token). For Datadog, verify your API key is active.
    • 404 Not Found: The OTLP endpoint URL is incorrect. Double-check the URL matches your platform and region.
    • Connection timeout: Your endpoint may be unreachable. For self-hosted Prometheus, confirm the host is publicly accessible and port 4318 is open.
  3. Check your platform’s data source. In Grafana Cloud, make sure you’re querying the correct Prometheus data source (not a Loki or Tempo source). In Datadog, check that your site region matches the endpoint you configured.

Metrics appear but values look wrong

  • Histogram metrics have _milliseconds in the name. This is normal — Prometheus appends unit suffixes from OTLP metadata. Use the full name (e.g., cerebrium_run_execution_time_ms_milliseconds_bucket) in your queries.
  • Container counts fluctuate during deploys. This is expected — you may see temporary spikes in cerebrium_containers_running_count during rolling deployments as new containers start and old ones drain.
  • Gaps in metrics. Short gaps (1-2 minutes) can occur during deployments or scaling events. If you see persistent gaps, check whether export was paused.

Still stuck?

Contact support@cerebrium.ai with the project ID and error message from the dashboard for further investigation.