Custom Python Code

Please note this feature is currently in Alpha and therefore is subject to change. To start we are only making it available to certain users so please contact support to get access. Current limitations:

  • Only available for Nvidia T4 and A10 instances
  • Memory limits of 16GB


Many of our more hardcore users wanted finer-grain control over their code and package dependencies which is why we opened up our underlying architecture to allow engineers and data scientists to deploy any custom Python code.

We have been focusing hard on the developer experience that if your code runs locally, you can deploy it to serverless CPUs/GPUs using one line. Initial deploys take about a minute (installing packages etc.) however, thereafter every change is < 10 seconds and of course inference requests have a cold start <1 second! You can get started with our simple tutorial below:

Project set up

Before building you need to set up a Cerebrium account. This is as simple as starting a new Project in Cerebrium and copying the API key. This will be used to authenticate all calls for this project.

Create a project

  1. Go to dashboard.cerebrium.ai
  2. Sign up or Login
  3. Navigate to the API Keys page
  4. You should see an API key with the source “Cerebrium”. Click the eye icon to display it. It will be in the format: c_api_key-xxx



Currently, our implementation has three components:

  • main.py - This is where your python code lives
  • requirements.txt - This is where you define your Python packages. Deployment will be quicker if you specify specific versions
  • pkglist.txt - This is where you can define linux packages you would like to install. We run the apt-install command for items in here

However, you can contain images and other Python files in the same directory, and they will get packaged and uploaded when you deploy.

Every main.py you deploy needs the following mandatory layout:

from pydantic import BaseModel

class Item(BaseModel):
    parameter: value

def predict(item, run_id, logger):
    item = Item(**item)

    ##Do something with parameters from item

    return {"key": "value}

The Item class is where you define the parameters your model receives as well as their type. Item needs to inherit from BaseModel. You need to define a function with the name predict which receives 3 params, item, run_id, logger.

Both the requirements.txt and pkglist.txt follow standard layouts in that each package should be on a new line.

Develop model

Now let us see this in action! Navigate to where your model code is stored - ie: your main.py.

Install the cerebrium-deploy framework script - this is not part of the official Cerebrium package as this functionality is currently in Alpha. You need to contact our team in order to get access to this script.

Copy and paste our code below. This uses a model from HuggingFace that allows us to classify age based on an image sent in from the user. If no image is used we use a default.

from io import BytesIO

import requests
from PIL import Image
from pydantic import BaseModel
from transformers import ViTFeatureExtractor, ViTForImageClassification

class Item(BaseModel):
    image: str

# Init model, transforms
model = ViTForImageClassification.from_pretrained('nateraw/vit-age-classifier')
transforms = ViTFeatureExtractor.from_pretrained('nateraw/vit-age-classifier')

def download_image(image):
    if image:
        r = requests.get(image)
        r = requests.get(

    return Image.open(BytesIO(r.content))

def predict(item, run_id, logger):
    item = Item(**item)
    # Get example image from official fairface repo + read it in as an image

    image = download_image(item.image)

    # Transform our image and pass it through the model
    inputs = transforms(image, return_tensors='pt')
    output = model(**inputs)

    # Predicted Class probabilities
    proba = output.logits.softmax(1)

    # Predicted Classes
    pred = proba.argmax(1).item()

    labels = {0:"0-2", 1: "3-9" , 2: "10-19", 3: "20-29", 4: "30-39", 5: "40-49", 6: "50-59", 7:"60-69",8:"more than 70"}
    predicted_class = labels[pred]

    return {"prediction": f"This person is in the {predicted_class} age category"}

You will see that we defined our model and tokenizer outside our predict function. The reason for this is that it will download and cache as soon as your code is deployed. If you define it within the predict function then it will only load on the first predict call, thereafter it will be cached.

We don’t require any linux packages, so we can keep our pkglist.txt empty but let’s make sure we install the required Python packages in our requirements.txt


Great! Now let us deploy our model

cerebrium-deploy -n <MODEL_NAME> -h <HARDWARE> -k <API_KEY>

For the parameters above:

  • <MODEL_NAME>*: Give your model a name for you to recognize on your Cerebrium dashboard
  • <HARDWARE>*: This is either cpu or gpu
  • <API_KEY>: This is the API key from your profile

Deployed Model

Your model is now deployed! Congrats! Usually your first deployment will be the longest because it will install the linux and Python packages you defined.

Navigate to your Cerebrium dashboard and on the Models page you will see your model.

You can run inference against your REST endpoint and pass it the required data

curl --location --request POST '<ENDPOINT>' \
--header 'Content-Type: application/json' \
--data '{"image":""}'

Custom Model Response

Navigate back to the dashboard and click on the name of the model you just deployed. Navigate to the tab that says ‘Runs and Logs’ in order to see the request you just made and the required logs.

With just a few lines of code, your custom python code was deployed to serverless CPUs/GPUs without setting up any infrastructure. After the first deployment, every subsequent deployment should be less than 10s, keep the development lifecycle as quick as possible. Try deploying your own model now and let us know what you are building!