Introduction

Cerebrium simplifies the deployment and running of machine learning apps by abstracting infrastructure management into configuration, allowing engineering teams to focus on what matters most - delivering value to customers using application code. A single TOML file manages environment setup, deployments, and scaling - tasks that typically require dedicated teams.

Cerebrium also handles containers differently from traditional Docker or Kubernetes setups. Instead of managing multiple configuration files and orchestration rules, teams declare requirements in cerebrium.toml. The system automatically handles container lifecycle, networking, and scaling based on this configuration.

Why TOML?

While Python decorators offer programmatic configuration, they scatter infrastructure settings throughout code files, making changes risky and review difficult. TOML centralizes all configuration in one place, making it easier to track changes and maintain consistency. Its straightforward syntax prevents the accidental complexity that often comes with code-based configuration, while its hierarchical structure naturally maps to modern app requirements.

Getting Started

The fastest and simplest way to create a config file is to run the cerebrium init command. This command creates a cerebrium.toml file in the project root, which can then be edited to suit specific application requirements.

Check out the Introductory Guide for more information on how to get started.

It is possible to initialize an existing project by adding a cerebrium.toml file to the root of your codebase, defining your entrypoint (main.py if using the default runtime, or adding an entrypoint to the .toml file if using a custom runtime) and including the necessary files in the deployment section of your cerebrium.toml file.

Hardware Configuration

Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations:

[cerebrium.hardware]
compute = "AMPERE_A10" # GPU selection
memory = 16.0          # Memory allocation in GB
cpu = 4                # Number of CPU cores
gpu_count = 1          # Number of GPUs

For detailed hardware specifications and performance characteristics see the GPU and Other Resources Guide.

Dependency management

Selecting a Python Version

The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10, 3.11 and 3.12. Specify the Python version in the deployment section of the configuration:

[cerebrium.deployment]
python_version = 3.11

The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release.

Changes to the Python version trigger a full rebuild since they affect both the base environment and all Python package installations.

Adding Python Packages

Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds:

[cerebrium.dependencies.pip]
torch = "==2.0.0"
transformers = "==4.30.0"
numpy = "latest"

Or using an existing requirements file:

[cerebrium.dependencies.paths]
pip = "requirements.txt"

For GitHub repositories, use shell commands instead of pip dependencies to ensure proper versioning.

The system implements an intelligent caching strategy at the node level. When an app is built, all pip packages are cached with their exact versions, including wheel files and compiled binaries. This means subsequent builds only need to install new or updated packages, significantly reducing build times.

Adding APT Packages

System-level packages provide the foundation for many ML apps, handling everything from image processing libraries to audio codecs. These can be added to the cerebrium.toml file under the [cerebrium.dependencies.apt] section as follows:

[cerebrium.dependencies.apt]
ffmpeg = "latest"
libopenblas-base = "latest"
libomp-dev = "latest"

For teams with standardized system dependencies, text files can be used instead by adding the following to the [cerebrium.dependencies.paths] section:

[cerebrium.dependencies.paths]
apt = "deps_folder/pkglist.txt"

Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone.

Conda Packages

Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing:

[cerebrium.dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"
opencv = "latest"

Teams using conda environments can specify their environment file:

[cerebrium.dependencies.paths]
conda = "conda_pkglist.txt"

Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time.

Build Commands

Cerebrium’s build process includes two specialized command types that execute at different stages during container image creation. These commands help configure the environment and prepare the application for deployment.

Pre-build Commands

Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment:

pre_build_commands = [
    # Add specialized build tools
    "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'",
    "chmod +x /usr/local/bin/pget"
]

Pre-build commands typically handle tasks like installing build tools, configuring system settings, or preparing the environment for subsequent build steps.

Shell Commands

Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment:

shell_commands = [
    # Initialize application resources
    "python -m download_models",
    "python -m compile_assets",
    "python -m init_app"
]

Shell commands excel at tasks that require the fully configured environment, such as compiling code that depends on installed libraries or downloading resources needed for the application.

Command Execution Impact

Any modification to either pre-build or shell commands triggers a rebuild of the corresponding section in the container image. This happens because these commands form integral parts of the final container environment. The build process and complete execution order are detailed in the Deployment Process section below.

Changes to either command type affect build time since they necessitate rebuilding parts of the container image. Consider batching related changes together when possible.

Custom Docker Base Images

The base image selection shapes how an app runs in Cerebrium. While the default Debian slim image works for most Python apps, other validated base images support specific requirements.

Supported Base Images

Cerebrium supports several categories of base images to ensure system compatibility:

# Default minimal image
docker_base_image_url = "debian:bookworm-slim"

# CUDA-enabled images
docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04"

The system accepts these image types:

Ubuntu-based CUDA Images

All nvidia/cuda images that include Ubuntu are supported. These provide GPU acceleration capabilities:

docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04"
docker_base_image_url = "nvidia/cuda:12.0.1-devel-ubuntu22.04"

Debian and Ubuntu Base Images

Any Debian or Ubuntu base image works as a foundation:

docker_base_image_url = "debian:bullseye"
docker_base_image_url = "ubuntu:22.04"

Python Images

Python images based on Debian bullseye or bookworm provide pre-configured Python environments:

docker_base_image_url = "python:3.11-bullseye"
docker_base_image_url = "python:3.11-bookworm"

Starting with a minimal Debian or Ubuntu base image is recommended, as CUDA images include many pre-installed components that increase container size. While the relationship isn’t strictly linear, larger container sizes generally lead to longer cold starts and build times. Begin with a lean base image and add only essential components as needed.

Custom Runtimes

While Cerebrium’s default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or websockets.

Basic Configuration

Define a custom runtime by adding the cerebrium.runtime.custom section to the configuration:

[cerebrium.runtime.custom]
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
port = 8080
healthcheck_endpoint = ""  # Empty string uses TCP health check

Key parameters:

  • entrypoint: Command to start the app (string or string list)
  • port: Port the app listens on
  • healthcheck_endpoint: Path for health checks

Check out this example for a detailed implementation of a FastAPI server that uses a custom runtime.

Self-Contained Servers

Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code:

[cerebrium.runtime.custom]
entrypoint = "vllm serve meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 8000 --device cuda"
port = 8000
healthcheck_endpoint = "/health"

[cerebrium.dependencies.pip]
torch = "latest"
vllm = "latest"

Important Notes

  • Code is mounted in /cortex - adjust paths accordingly
  • The port in your entrypoint must match the port parameter
  • Install any required server packages (uvicorn, gunicorn, etc.) via pip dependencies
  • All endpoints will be available at https://api.cortex.cerebrium.ai/v4/{project-id}/{app-name}/your/endpoint

Deploy as normal with cerebrium deploy -y - the system automatically detects and handles custom runtime configuration.

Deployment process

The build process follows a carefully orchestrated sequence that transforms source code into a production-ready container image. Let’s walk through each step:

Stage 1: App Upload

The process begins when code is uploaded to Cerebrium. This includes all source files, configuration, and any additional assets needed for the app.

Stage 2: Image Creation

The system then creates a container image through the following steps, each building upon the previous:

  1. Pre-build Commands Execute: First, any pre-build commands run. These set up the build environment and compile necessary assets before the main installation steps begin.
  2. APT Dependencies Install: System-level packages install next, establishing the foundation for all other dependencies.
  3. Conda Dependencies Install: After APT packages are in place, Conda packages install.
  4. Pip Dependencies Install: Python packages install last, ensuring they have access to all necessary system libraries and binaries.
  5. Python Code Copy: The app’s source code copies into the container, placing it in the correct directory structure.
  6. Shell Commands Execute: Finally, any build-time shell commands run to complete the image setup.

Stage 3: Production Image

The result is a production-ready container image that contains everything needed to run the app. This image serves as a blueprint for creating individual containers when the app receives requests.