cerebrium.toml
is handled through the [cerebrium.hardware]
section, where you can specify both the type (using the compute
parameter) and quantity of GPUs (gpu_count
) for your app. We address additional deployment configurations and GPU scaling considerations in more detail in the sections below.
GPU Model | Identifier | VRAM (GB) | Max GPUs | Plan required | Provider |
---|---|---|---|---|---|
NVIDIA H100 | HOPPER_H100 | 80 | 8 | Enterprise | AWS |
NVIDIA A100 | AMPERE_A100_80GB | 80 | 8 | Enterprise | AWS |
NVIDIA A100 | AMPERE_A100_40GB | 40 | 8 | Enterprise | AWS |
NVIDIA L40s | ADA_L40 | 48 | 8 | Hobby+ | AWS |
NVIDIA L4 | ADA_L4 | 24 | 8 | Hobby+ | AWS |
NVIDIA A10 | AMPERE_A10 | 24 | 8 | Hobby+ | AWS |
NVIDIA T4 | TURING_T4 | 16 | 8 | Hobby+ | AWS |
AWS Inferentia 2 | INF2 | 32 | 8 | Hobby+ | AWS |
AWS Trainium | TRN1 | 32 | 8 | Hobby+ | AWS |
cerebrium.toml
file. It consists of the GPU
model generation and model name to avoid ambiguity.--compute
and --gpu-count
flags
during application initialization.cerebrium.toml
file: