After you’ve created your Cortex project, you may find that you need more control over your deployment than the default cerebrium deploy command provides. For example, you may want to specify the number of GPUs to use, the amount of memory to use or even the version of Python for your environment. These settings and more are all configurable using config files.

Your config file is a TOML file that you can use to specify the parameters of your cortex deployment. This file is used to specify the deployment parameters, build parameters, hardware parameters, scaling parameters and dependencies for your deployment.

Creating a config file

The fastest and simplest way to create a config file is to run the cerebrium init command and specify the directory in which you would like to create the config file. This command will create a cerebrium.toml file in your project root, which you can then edit to suit your needs.

cerebrium init my-project-dir

Deployment Parameters

Deployment parameters govern the persistent environment in which your model is deployed. These parameters are specified under the cerebrium.deployment section of your config file.

The available deployment parameters are:

parameterdescriptiontypedefault
nameThe name of your deploymentstringmy-model
python_versionThe Python version available for your runtimefloat3.10
includeLocal files to include in the deployment.string’[./*, main.py]’
excludeLocal Files to exclude from the deployment.string’[./.*, ./__*]’

Build Parameters

Build parameters are used during the build phase of your deployment. They give you control over how your deployment is built and how it is tested. These parameters are specified under the cerebrium.build section of your config file.

parameterdescriptiontypedefault
predict_dataThe data to use to test your predict function on build. If this fails, your build fails.string’{“prompt”: “Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.”}’
disable_predictDisable running your predict function at the end of a build.booleanfalse
force_rebuildWhether to force a rebuild of your deployment, clearing all caches and starting from scratch.booleanfalse
disable_animationWhether to disable the animation in the logs.booleanfalse
log_levelLog level for the build step of your deploymentstringINFO
disable_confirmationWhether to disable all CLI confirmations before deploying. Use for CI/CDbooleanfalse

Hardware Parameters

The hardware parameters section is where you can define the specifications of the machine you would like to use for your deployment. This allows you to tailor your deployment to your specific needs, optimizing for cost or performance as you see fit.
These parameters are specified under the cerebrium.hardware section of your config file.

The available hardware parameters in your config are:

parameterdescriptiontypedefault
gpuThe GPU you would like to use.stringAMPERE_A5000
cpuThe number of CPU cores to use.int2
memoryThe amount of Memory to use in GB.float14.5
gpu_countThe number of GPUs to specify.int1
providerThe provider you would like your deployment to be on. This is automatically selected from the GPU type if not specifiedstringaws

Available Hardware

The following is the hardware available on Cerebrium

nameprovider
CPU[coreweave, aws]
TURING_4000[coreweave]
TURING_5000[coreweave]
AMPERE_A4000[coreweave]
AMPERE_A5000[coreweave]
AMPERE_A6000[coreweave]
AMPERE_A10[aws]
ADA_L4[aws]
AMPERE_A100[coreweave, aws]
AMPERE_A100_40GB[coreweave, aws]

Scaling Parameters

This section lets you configure how you would like your deployment to scale. You can use these parameters to control the minimum and maximum number of replicas to run, as well as the cooldown period between requests. For example, you could increase your cooldown time or even set a minimum number of replicas to run, increasing availability and avoiding cold starts.

These parameters are specified under the cerebrium.scaling section of your config file.

parameterdescriptiontypedefault
min_replicasThe minimum number of replicas to run at all times.int0
max_replicasThe maximum number of replicas to scale to.intplan limit
cooldownThe number of seconds to keep your model warm after each request. It resets after every request ends.int60

Adding Dependencies

The dependencies section of your config file is where you can specify any dependencies you would like to install in your deployment. We support pip, conda and apt dependencies and you can specify each of these in their relevant subsection of the dependencies section.

For each dependency type, you can specify the name of the package you would like to install and the version constraints. If you do not want to specify any version constraints, you can use the latest keyword to install the latest version of the package.

If you have an existing requirements.txt, pkglist.txt or conda_pkglist.txt, we’ll prompt you to automatically integrate these into your config file when you run cerebrium deploy.

pip

Your pip dependencies are specified under the cerebrium.dependencies.pip section of your config file. An example of a pip dependency is shown below:

[cerebrium.dependencies.pip]
torch = ">=2.0.0"
numpy = "latest"

conda

Similarly, your conda dependencies are specified under the cerebrium.dependencies.conda section of your config file. An example of a conda dependency is shown below:

[cerebrium.dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"

apt

Finally, your apt dependencies are specified under the cerebrium.dependencies.apt section of your config file.
These are any package that you would install using apt-get install on a Linux machine. An example of an apt dependency is shown below:

[cerebrium.dependencies.apt]
"libgl1-mesa-glx" = "latest"
"libglib2.0-0" = "latest"

Integrate existing requirements files

If you have an existing requirements.txt, pkglist.txt or conda_pkglist.txt, files in your project, we’ll prompt you to automatically integrate these into your config file when you run cerebrium deploy.

This way, you can leverage external tools to manage your dependencies and have them automatically integrated into your deployment.
For example, you can use the following command to generate a requirements.txt file from your current environment:

Config File Example

That was a lot of information!
Let’s see an example of a config file in action.

Below is an example of a config file that takes advantage of all the features we’ve discussed so far.


[cerebrium.build]
predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}"
force_rebuild = false
disable_animation = false
log_level = "INFO"
disable_confirmation = false

[cerebrium.deployment]
name = "my-model"
python_version = "3.10"
include = "[./*, main.py]"
exclude = "[./.*, ./__*]"

[cerebrium.hardware]
gpu = "AMPERE_A5000"
cpu = 2
memory = 16.0
gpu_count = 1
provider = "aws"

[cerebrium.scaling]
min_replicas = 0
cooldown = 60

[cerebrium.dependencies.pip]
torch = ">=2.0.0"

[cerebrium.dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"

[cerebrium.dependencies.apt]
"libgl1-mesa-glx" = "latest"
"libglib2.0-0" = "latest"