Using CUDA

Overview

CUDA (Compute Unified Device Architecture) enables apps to use graphics cards (GPUs) to speed up calculations. Unlike standard processors (CPUs) that handle one task at a time, graphics cards can handle many tasks simultaneously.

How CUDA Works

CUDA connects apps to graphics cards, splitting large tasks into smaller pieces that can be processed at the same time. This makes operations like image processing and complex mathematical calculations much faster than using a standard processor alone.

Using CUDA with App Dependencies

Many Python packages include built-in CUDA support. PyTorch, for example, includes CUDA in its installation:

[cerebrium.dependencies.pip]
torch = "latest" # PyTorch with graphics card support

Special Requirements

Some apps need direct access to CUDA system libraries and tools. The CUDA base image provides this complete CUDA toolkit environment. This is often necessary when apps:

Compile custom CUDA code
Access low-level CUDA features
Need specific CUDA driver versions
Require CUDA development tools

Set the base image in the cerebrium.toml file:

[cerebrium.deployment]
docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04"

Cold-Start Optimization

Image size and complexity directly impact cold-start performance — the time needed to initialize an app from an inactive state. Cerebrium uses a content-addressable file system that selectively pulls only required files, but larger images still affect startup times.

Image Size Considerations

Base image choices affect cold-start times in several ways:

Development images (like nvidia/cuda:*-devel) include additional tools, increasing size
Runtime images provide minimal dependencies for faster initialization
Each additional layer or installed package increases the final image size

The final image size appears in the dashboard after build completion, helping track size optimization efforts.

Balancing Tradeoffs

Cold-start optimization requires balancing competing needs:

# Minimal runtime image - faster cold-starts
docker_base_image_url = "debian:bookworm-slim"

# Full development image - slower cold-starts, more tools
docker_base_image_url = "nvidia/cuda:12.0.1-devel-ubuntu22.04"

Keeping instances warm avoids cold-starts entirely, at the cost of higher resource usage.

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Networking

Storage

Partner Services

Integrations

Other concepts

Overview

How CUDA Works

Using CUDA with App Dependencies

Special Requirements

Cold-Start Optimization

Image Size Considerations

Balancing Tradeoffs

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Networking

Storage

Partner Services

Integrations

Other concepts

​Overview

​How CUDA Works

​Using CUDA with App Dependencies

​Special Requirements

​Cold-Start Optimization

​Image Size Considerations

​Balancing Tradeoffs

Overview

How CUDA Works

Using CUDA with App Dependencies

Special Requirements

Cold-Start Optimization

Image Size Considerations

Balancing Tradeoffs