Introduction

Cerebrium is the infrastructure platform for real-time and high-performance AI workloads. It is a strong fit when an app requires one or more of the following:

Low latency and low cold starts
Bursty traffic that should scale without wasting GPU capacity
Multi-region deployments for global users or data residency
Realtime voice, video, and streaming workloads
GPU-heavy production inference that has to stay reliable under load

Why teams choose Cerebrium

Launch code in the cloud in seconds
Run CPUs or GPUs with automatic scaling
Serve REST APIs, streaming endpoints, webSockets, or any ASGI-compatible app
Deploy across multiple regions for lower latency and residency requirements
Tune concurrency and batching for real production traffic
Improve startup performance with cold-start optimization strategies
Store model weights and files with persistent storage
Pay only for the compute you use - billed by the second

Start by workload

Pick the closest path below to get started:

OpenAI-compatible LLM endpoint → Serve an OpenAI Compatible LLM with vLLM
Voice AI / real-time speech → Deploy a Twilio Voice Agent with Pipecat
Image and video generation → Generate images using SDXL
Python Apps → Deploy Gradio Chat Interface

For the fastest first deployment, follow the quickstart below.

Quickstart

Set up and deploy an app on Cerebrium in a few steps.

1. Install the CLI

Python (pip)
macOS (Homebrew)
Linux
Windows

pip install cerebrium

brew tap cerebriumai/tap
brew install cerebrium

# Ubuntu/Debian
wget https://github.com/CerebriumAI/cerebrium/releases/latest/download/cerebrium_linux_amd64.deb
sudo dpkg -i cerebrium_linux_amd64.deb

# Or binary installation
curl -L https://github.com/CerebriumAI/cerebrium/releases/latest/download/cerebrium_cli_linux_amd64.tar.gz | tar xz
sudo mv cerebrium /usr/local/bin/

# PowerShell (Run as Administrator)
Invoke-WebRequest -Uri "https://github.com/CerebriumAI/cerebrium/releases/latest/download/cerebrium_cli_windows_amd64.zip" -OutFile "cerebrium.zip"
Expand-Archive -Path "cerebrium.zip" -DestinationPath "."
# Add cerebrium.exe to PATH

2. Log in to the CLI

cerebrium login

This opens your browser so you can authenticate your CLI session.

3. Initialize a project

cerebrium init my-first-app
cd my-first-app

This creates a basic project with main.py for app code and cerebrium.toml for configuration.

def run(prompt: str):
    print(f"Running on Cerebrium: {prompt}")
    return {"my_result": prompt}

4. Run code remotely

Run the function in the cloud and pass it a prompt:

cerebrium run main.py::run --prompt "Hello World!"

The prompt appears in the logs. This is useful for quick code iteration, testing snippets, or one-off scripts that need cloud CPU/GPU resources.

5. Deploy your app

cerebrium deploy

This turns the function into a persistent REST endpoint that accepts JSON input and can scale automatically. Once deployed, the app is callable at a POST endpoint:

https://api.aws.us-east-1.cerebrium.ai/v4/{project-id}/{app-name}/{function-name}

6. What to do next

Useful next steps after a first deployment:

Join the Community Discord for support and updates.

How Cerebrium works

Cerebrium uses containerization to ensure consistent environments and reliable scaling for apps. When code is deployed, Cerebrium packages it with all necessary dependencies into a container image. This image serves as a blueprint for creating instances that handle incoming requests. The system automatically manages scaling, creating new instances when traffic increases and removing them during quiet periods. For a detailed explanation of how Cerebrium builds and manages container images, see the Defining Container Images Guide.

Content-Aware Storage forms the foundation of Cerebrium’s speed. This system intelligently manages container images by understanding their content structure. When launching new instances, it pulls only the specific files. This targeted approach significantly reduces cold start times and optimizes resource usage.

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Networking

Storage

Partner Services

Integrations

Other concepts

Why teams choose Cerebrium

Start by workload

Quickstart

1. Install the CLI

2. Log in to the CLI

3. Initialize a project

4. Run code remotely

5. Deploy your app

6. What to do next

How Cerebrium works

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Networking

Storage

Partner Services

Integrations

Other concepts

​Why teams choose Cerebrium

​Start by workload

​Quickstart

​1. Install the CLI

​2. Log in to the CLI

​3. Initialize a project

​4. Run code remotely

​5. Deploy your app

​6. What to do next

​How Cerebrium works

Why teams choose Cerebrium

Start by workload

Quickstart

1. Install the CLI

2. Log in to the CLI

3. Initialize a project

4. Run code remotely

5. Deploy your app

6. What to do next

How Cerebrium works