After you’ve created your Cortex project, you may find that you need more control over your deployment than the default cerebrium deploy command provides. For example, you may want to specify the number of GPUs to use, the amount of memory to use or even the version of Python for your environment. These settings and more are all configurable using config files.

Your config file is a TOML file that you can use to specify the parameters of your cortex deployment. This file is used to specify the deployment parameters, build parameters, hardware parameters, scaling parameters and dependencies for your deployment.

Creating a config file

The fastest and simplest way to create a config file is to run the cerebrium init command and specify the directory in which you would like to create the config file. This command will create a cerebrium.toml file in your project root, which you can then edit to suit your needs. You can specify any field you wish to be prepopulated with a specific value.

cerebrium init my-project-dir --name=<your-app-name> --gpu=ADA_L4

Deployment Parameters

Deployment parameters govern the persistent environment in which your model is deployed. These parameters are specified under the cerebrium.deployment section of your config file.

The available deployment parameters are:

parameterdescriptiontypedefault
nameThe name of your deploymentstringmy-model
python_versionThe Python version available for your runtimefloat
includeLocal files to include in the deployment.string’[./*, main.py]’
excludeLocal Files to exclude from the deployment.string’[./.*]’
docker_base_image_urlThe docker base image you would like to runstring‘debian:bookworm-slim’
shell_commandsA list of commands to run an application entrypoint scriptlist[string][]

Hardware Parameters

The hardware parameters section is where you can define the specifications of the machine you would like to use for your deployment. This allows you to tailor your deployment to your specific needs, optimizing for cost or performance as you see fit.
These parameters are specified under the cerebrium.hardware section of your config file.

The available hardware parameters in your config are:

parameterdescriptiontypedefault
cpuThe number of CPU cores to use.int2
memoryThe amount of Memory to use in GB.float14
computeThe GPU you would like to use.stringCPU
gpu_countThe number of GPUs to specify.int1
providerThe provider you would like your deployment to be on. v4 only supports awsstringaws
regionThe region you would like your deployment to be on. v4 only supports us-east-1stringus-east-1

Available Hardware

The following is the hardware available on Cerebrium

NameProviderAPI Compatibility
CPU[aws][v4]
AMPERE_A10[aws][v4]
ADA_L4[aws][v4]
AMPERE_A100[aws][v4]
AMPERE_A100_40GB[aws][v4]
HOPPER_H100[aws][v4]
INF2[aws][v4]
TRN1[aws][v4]

Scaling Parameters

This section lets you configure how you would like your deployment to scale. You can use these parameters to control the minimum and maximum number of replicas to run, as well as the cooldown period between requests. For example, you could increase your cooldown time or even set a minimum number of replicas to run, increasing availability and avoiding cold starts.

These parameters are specified under the cerebrium.scaling section of your config file.

parameterdescriptiontypedefault
min_replicasThe minimum number of replicas to run at all times.int0
max_replicasThe maximum number of replicas to scale to.intplan limit
cooldownThe number of seconds to keep your model warm after each request. It resets after every request ends.int60

Adding Dependencies

The dependencies section of your config file is where you can specify any dependencies you would like to install in your deployment. We support pip, conda and apt dependencies and you can specify each of these in their relevant subsection of the dependencies section.

For each dependency type, you can specify the name of the package you would like to install and the version constraints. If you do not want to specify any version constraints, you can use the latest keyword to install the latest version of the package.

If you have an existing requirements.txt, pkglist.txt or conda_pkglist.txt, we’ll prompt you to automatically integrate these into your config file when you run cerebrium deploy.

pip

Your pip dependencies are specified under the cerebrium.dependencies.pip section of your config file. An example of a pip dependency is shown below:

[cerebrium.dependencies.pip]
torch = ">=2.0.0"
numpy = "latest"

conda

Similarly, your conda dependencies are specified under the cerebrium.dependencies.conda section of your config file. An example of a conda dependency is shown below:

[cerebrium.dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"

apt

Finally, your apt dependencies are specified under the cerebrium.dependencies.apt section of your config file.
These are any package that you would install using apt-get install on a Linux machine. An example of an apt dependency is shown below:

[cerebrium.dependencies.apt]
"libgl1-mesa-glx" = "latest"
"libglib2.0-0" = "latest"

Integrate existing requirements files

If you have an existing requirements.txt, pkglist.txt or conda_pkglist.txt, files in your project, we’ll prompt you to automatically integrate these into your config file when you run cerebrium deploy.

This way, you can leverage external tools to manage your dependencies and have them automatically integrated into your deployment.
For example, you can use the following command to generate a requirements.txt file from your current environment:

Config File Example

That was a lot of information!
Let’s see an example of a config file in action.

Below is an example of a config file that takes advantage of all the features we’ve discussed so far.

[cerebrium.deployment]
name = "my-model"
python_version = "3.10"
include = "[./*, main.py]"
exclude = "[./.*, ./__*]"
docker_base_image_url = "debian:bookworm-slim"
shell_commands = []

[cerebrium.hardware]
compute = "AMPERE_A10"
cpu = 2
memory = 16.0
gpu_count = 1
provider = "aws"
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 2
cooldown = 60

[cerebrium.dependencies.pip]
torch = ">=2.0.0"

[cerebrium.dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"

[cerebrium.dependencies.apt]
"libgl1-mesa-glx" = "latest"
"libglib2.0-0" = "latest"