Using CUDA

Overview

CUDA (Compute Unified Device Architecture) enables apps to use graphics cards (GPUs) to speed up calculations. Unlike standard processors (CPUs) that handle one task at a time, graphics cards can handle many tasks simultaneously.

How CUDA Works

CUDA connects apps to graphics cards, splitting large tasks into smaller pieces that can be processed at the same time. This makes operations like image processing and complex mathematical calculations much faster than using a standard processor alone.

Using CUDA with App Dependencies

Many Python packages come with built-in CUDA support. For example, the popular machine learning package PyTorch includes CUDA in its installation:

[cerebrium.dependencies.pip]
torch = "latest" # PyTorch with graphics card support

Special Requirements

Some apps need direct access to CUDA system libraries and tools. The CUDA base image provides this complete CUDA toolkit environment. This is often necessary when apps:

Compile custom CUDA code
Access low-level CUDA features
Need specific CUDA driver versions
Require CUDA development tools

The base image can be set in the cerebrium.toml file:

[cerebrium.deployment]
docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04"

Cold-Start Optimization

Image size and complexity significantly impact cold-start performance - the time needed to initialize an app from an inactive state. While Cerebrium uses a content-addressable file system that selectively pulls only required files, larger images still affect startup times.

Image Size Considerations

Base image choices affect cold-start times in several ways:

Development images (like nvidia/cuda:*-devel) include additional tools, increasing size
Runtime images provide minimal dependencies for faster initialization
Each additional layer or installed package increases the final image size

The final image size appears in the dashboard after build completion, helping track size optimization efforts.

Balancing Tradeoffs

Cold-start optimization often involves balancing competing needs:

# Minimal runtime image - faster cold-starts
docker_base_image_url = "debian:bookworm-slim"

# Full development image - slower cold-starts, more tools
docker_base_image_url = "nvidia/cuda:12.0.1-devel-ubuntu22.04"

Some apps might benefit from keeping instances warm to avoid cold-starts entirely, though this affects resource usage. For a practical implementation, see the Stable Diffusion Example.

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

Overview

How CUDA Works

Using CUDA with App Dependencies

Special Requirements

Cold-Start Optimization

Image Size Considerations

Balancing Tradeoffs

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

​Overview

​How CUDA Works

​Using CUDA with App Dependencies

​Special Requirements

​Cold-Start Optimization

​Image Size Considerations

​Balancing Tradeoffs

Overview

How CUDA Works

Using CUDA with App Dependencies

Special Requirements

Cold-Start Optimization

Image Size Considerations

Balancing Tradeoffs