Defining Container Images
Introduction
Cerebrium simplifies the deployment and running of machine learning apps by abstracting infrastructure management into configuration, allowing engineering teams to focus on what matters most - delivering value to customers using application code. A single TOML file manages environment setup, deployments, and scaling - tasks that typically require dedicated teams.
Cerebrium also handles containers differently from traditional Docker or Kubernetes setups. Instead of managing multiple configuration files and orchestration rules, teams declare requirements in cerebrium.toml
. The system automatically handles container lifecycle, networking, and scaling based on this configuration.
Why TOML?
While Python decorators offer programmatic configuration, they scatter infrastructure settings throughout code files, making changes risky and review difficult. TOML centralizes all configuration in one place, making it easier to track changes and maintain consistency. Its straightforward syntax prevents the accidental complexity that often comes with code-based configuration, while its hierarchical structure naturally maps to modern app requirements.
Getting Started
The fastest and simplest way to create a config file is to run the cerebrium init
command. This command creates a cerebrium.toml
file in the project root, which can then be edited to suit specific application requirements.
Check out the Introductory Guide for more information on how to get started.
It is possible to initialize an existing project by adding a cerebrium.toml
file to the root of your codebase, defining your entrypoint (main.py
if
using the default runtime, or adding an entrypoint to the .toml file if using
a custom runtime) and including the necessary files in the deployment
section of your cerebrium.toml
file.
Hardware Configuration
Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations:
For detailed hardware specifications and performance characteristics see the GPU and Other Resources Guide.
Dependency management
Selecting a Python Version
The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10, 3.11 and 3.12. Specify the Python version in the deployment section of the configuration:
The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release.
Changes to the Python version trigger a full rebuild since they affect both the base environment and all Python package installations.
Adding Python Packages
Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds:
Or using an existing requirements file:
For GitHub repositories, use shell commands instead of pip dependencies to ensure proper versioning.
The system implements an intelligent caching strategy at the node level. When an app is built, all pip packages are cached with their exact versions, including wheel files and compiled binaries. This means subsequent builds only need to install new or updated packages, significantly reducing build times.
Adding APT Packages
System-level packages provide the foundation for many ML apps, handling everything from image processing libraries to audio codecs. These can be added to the cerebrium.toml
file under the [cerebrium.dependencies.apt]
section as follows:
For teams with standardized system dependencies, text files can be used instead by adding the following to the [cerebrium.dependencies.paths]
section:
Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone.
Conda Packages
Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing:
Teams using conda environments can specify their environment file:
Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time.
Build Commands
Cerebrium’s build process includes two specialized command types that execute at different stages during container image creation. These commands help configure the environment and prepare the application for deployment.
Pre-build Commands
Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment:
Pre-build commands typically handle tasks like installing build tools, configuring system settings, or preparing the environment for subsequent build steps.
Shell Commands
Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment:
Shell commands excel at tasks that require the fully configured environment, such as compiling code that depends on installed libraries or downloading resources needed for the application.
Command Execution Impact
Any modification to either pre-build or shell commands triggers a rebuild of the corresponding section in the container image. This happens because these commands form integral parts of the final container environment. The build process and complete execution order are detailed in the Deployment Process section below.
Changes to either command type affect build time since they necessitate rebuilding parts of the container image. Consider batching related changes together when possible.
Custom Docker Base Images
The base image selection shapes how an app runs in Cerebrium. While the default Debian slim image works for most Python apps, other validated base images support specific requirements.
Supported Base Images
Cerebrium supports several categories of base images to ensure system compatibility:
The system accepts these image types:
Ubuntu-based CUDA Images
All nvidia/cuda
images that include Ubuntu are supported. These provide GPU acceleration capabilities:
Debian and Ubuntu Base Images
Any Debian or Ubuntu base image works as a foundation:
Python Images
Python images based on Debian bullseye or bookworm provide pre-configured Python environments:
Starting with a minimal Debian or Ubuntu base image is recommended, as CUDA images include many pre-installed components that increase container size. While the relationship isn’t strictly linear, larger container sizes generally lead to longer cold starts and build times. Begin with a lean base image and add only essential components as needed.
Custom Runtimes
While Cerebrium’s default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or websockets.
Basic Configuration
Define a custom runtime by adding the cerebrium.runtime.custom
section to the configuration:
Key parameters:
entrypoint
: Command to start the app (string or string list)port
: Port the app listens onhealthcheck_endpoint
: Path for health checks
Check out this example for a detailed implementation of a FastAPI server that uses a custom runtime.
Self-Contained Servers
Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code:
Important Notes
- Code is mounted in
/cortex
- adjust paths accordingly - The port in your entrypoint must match the
port
parameter - Install any required server packages (uvicorn, gunicorn, etc.) via pip dependencies
- All endpoints will be available at
https://api.cortex.cerebrium.ai/v4/{project-id}/{app-name}/your/endpoint
Deploy as normal with cerebrium deploy -y
- the system automatically detects and handles custom runtime configuration.
Deployment process
The build process follows a carefully orchestrated sequence that transforms source code into a production-ready container image. Let’s walk through each step:
Stage 1: App Upload
The process begins when code is uploaded to Cerebrium. This includes all source files, configuration, and any additional assets needed for the app.
Stage 2: Image Creation
The system then creates a container image through the following steps, each building upon the previous:
- Pre-build Commands Execute: First, any pre-build commands run. These set up the build environment and compile necessary assets before the main installation steps begin.
- APT Dependencies Install: System-level packages install next, establishing the foundation for all other dependencies.
- Conda Dependencies Install: After APT packages are in place, Conda packages install.
- Pip Dependencies Install: Python packages install last, ensuring they have access to all necessary system libraries and binaries.
- Python Code Copy: The app’s source code copies into the container, placing it in the correct directory structure.
- Shell Commands Execute: Finally, any build-time shell commands run to complete the image setup.
Stage 3: Production Image
The result is a production-ready container image that contains everything needed to run the app. This image serves as a blueprint for creating individual containers when the app receives requests.