Skip to main content
Cerebrium is the infrastructure platform for real-time and high-performance AI workloads. It is a strong fit when an app requires one or more of the following:
  • Low latency and low cold starts
  • Bursty traffic that should scale without wasting GPU capacity
  • Multi-region deployments for global users or data residency
  • Realtime voice, video, and streaming workloads
  • GPU-heavy production inference that has to stay reliable under load

Why teams choose Cerebrium

Start by workload

Pick the closest path below to get started: For the fastest first deployment, follow the quickstart below.

Quickstart

Set up and deploy an app on Cerebrium in a few steps.

1. Install the CLI

pip install cerebrium

2. Log in to the CLI

cerebrium login
This opens your browser so you can authenticate your CLI session.

3. Initialize a project

cerebrium init my-first-app
cd my-first-app
This creates a basic project with main.py for app code and cerebrium.toml for configuration.
def run(prompt: str):
    print(f"Running on Cerebrium: {prompt}")
    return {"my_result": prompt}

4. Run code remotely

Run the function in the cloud and pass it a prompt:
cerebrium run main.py::run --prompt "Hello World!"
The prompt appears in the logs. This is useful for quick code iteration, testing snippets, or one-off scripts that need cloud CPU/GPU resources.

5. Deploy your app

cerebrium deploy
This turns the function into a persistent REST endpoint that accepts JSON input and can scale automatically. Once deployed, the app is callable at a POST endpoint:
https://api.aws.us-east-1.cerebrium.ai/v4/{project-id}/{app-name}/{function-name}

6. What to do next

Useful next steps after a first deployment: Join the Community Discord for support and updates.

How Cerebrium works

Cerebrium uses containerization to ensure consistent environments and reliable scaling for apps. When code is deployed, Cerebrium packages it with all necessary dependencies into a container image. This image serves as a blueprint for creating instances that handle incoming requests. The system automatically manages scaling, creating new instances when traffic increases and removing them during quiet periods. For a detailed explanation of how Cerebrium builds and manages container images, see the Defining Container Images Guide.
Content-Aware Storage forms the foundation of Cerebrium’s speed. This system intelligently manages container images by understanding their content structure. When launching new instances, it pulls only the specific files. This targeted approach significantly reduces cold start times and optimizes resource usage.