Skip to main content

Overview

CUDA (Compute Unified Device Architecture) enables apps to use graphics cards (GPUs) to speed up calculations. Unlike standard processors (CPUs) that handle one task at a time, graphics cards can handle many tasks simultaneously.

How CUDA Works

CUDA connects apps to graphics cards, splitting large tasks into smaller pieces that can be processed at the same time. This makes operations like image processing and complex mathematical calculations much faster than using a standard processor alone.

Using CUDA with App Dependencies

Many Python packages include built-in CUDA support. PyTorch, for example, includes CUDA in its installation:
[cerebrium.dependencies.pip]
torch = "latest" # PyTorch with graphics card support

Special Requirements

Some apps need direct access to CUDA system libraries and tools. The CUDA base image provides this complete CUDA toolkit environment. This is often necessary when apps:
  • Compile custom CUDA code
  • Access low-level CUDA features
  • Need specific CUDA driver versions
  • Require CUDA development tools
Set the base image in the cerebrium.toml file:
[cerebrium.deployment]
docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04"

Cold-Start Optimization

Image size and complexity directly impact cold-start performance — the time needed to initialize an app from an inactive state. Cerebrium uses a content-addressable file system that selectively pulls only required files, but larger images still affect startup times.

Image Size Considerations

Base image choices affect cold-start times in several ways:
  • Development images (like nvidia/cuda:*-devel) include additional tools, increasing size
  • Runtime images provide minimal dependencies for faster initialization
  • Each additional layer or installed package increases the final image size
The final image size appears in the dashboard after build completion, helping track size optimization efforts.

Balancing Tradeoffs

Cold-start optimization requires balancing competing needs:
# Minimal runtime image - faster cold-starts
docker_base_image_url = "debian:bookworm-slim"

# Full development image - slower cold-starts, more tools
docker_base_image_url = "nvidia/cuda:12.0.1-devel-ubuntu22.04"
Keeping instances warm avoids cold-starts entirely, at the cost of higher resource usage.