Overview
CPU and memory resources are allocated per container and billed based on actual usage. Configure each app with specific CPU and memory requirements to optimize performance and cost.Resource Configuration
CPU Configuration
CPU resources are specified as vCPU units (float) in thecerebrium.toml file:
0.5).
Memory Configuration
Memory is specified in gigabytes as a floating-point number:Memory and CPU are billed based on usage, which reduces costs for end-users
and doesn’t require the overprovisioning of an entire instance.
Resource Limits
Resource limits depend on the selected hardware configuration:| Hardware Type | Max CPU Cores | Max Memory (GB) |
|---|---|---|
| CPU Only | 48 | 96 |
| ADA_L40 | 16 | 128 |
| AMPERE_A100 | 12 | 140 |
| AMPERE_A10 | 48 | 192 |
| ADA_L4 | 48 | 192 |
| TURING_T4 | 48 | 192 |
| HOPPER_H100 | 24 | 256 |
| HOPPER_H200 | 24 | 256 |
| BLACKWELL_B200 | 24 | 256 |
| BLACKWELL_B300 | 24 | 512 |
| TRN1 | 128 | 512 |
Memory Optimization
The Transformers library provides memory optimization through thelow_cpu_mem_usage flag, which reduces memory footprint at the cost of longer initialization times. Implement lazy loading for large datasets to further reduce memory usage. Monitor memory patterns through platform metrics to identify optimization opportunities. Use memory-efficient model loading techniques for large-scale deployments.