Test and Deploy

Start by importing our Python framework, Cerebrium, that allows us to abstract away the complexity of provisioning infrastructure, versioning and much more!

from cerebrium import deploy, model_type

Some model training logic...

endpoint = deploy(('<MODEL_TYPE>', '<MODEL_FILE>'), '<MODEL_NAME>', '<API_KEY>')

In the deploy function there are the following parameters:

  • A tuple of the model type and the model file:
    • MODEL_TYPE: This parameter specifies the type of model you are supplying Cerebrium and must be a model_type. This is to ensure that Cerebrium knows how to handle your model. The current supported model types are:
      • model_type.SKLEARN: Expects a .pkl file (there is no requirement of the model to be a regressor or classifier).
      • model_type.SKLEARN_CLASSIFIER: Expects a .pkl file (the model must be a classifier. returns a class probability distribution instead of a single class prediction)
      • model_type.SKLEARN_PREPROCESSOR: Expects a .pkl file. This is a special model type that is used to preprocess data with the .transform method before it is sent to the model, such as a scaler or a one-hot encoder.
      • model_type.TORCH: Expects a .pkl file serialized with cloudpickle or a JIT script Torchscript .pt file.
      • model_type.XGBOOST_REGRESSOR: Expects a serialized .pkl file or a XGB .json file.
      • model_type.XGBOOST_CLASSIFIER: Expects a serialized .pkl file or a XGB .json file.
      • model_type.ONNX: Expects a serialized .onnx file
      • model_type.SPACY: Expects a folder path of your Spacy model
      • model_type.HUGGINGFACE_PIPELINE: Expects the task identifier and model identifier of the Hugging face model
    • MODEL_FILE: This is the string path to the model object you have obtained as a result of training locally or in the cloud. This is usually some model object or model file that you have exported.
  • MODEL_NAME: The name you would like to give your model (alphanumeric, with hyphens and less than 20 characters). This is a unique identifier for your model and will be used to call your model in the future.
  • API_KEY: This is the API key that can be found on your profile. You can get it here.

Every unique model name will create a separate deployment with a separate endpoint. It is important to keep track of the model names you have used so that you can call the correct model in the future. If you deploy a model with the same name as a previous model, the previous model will be archived and the new model will be deployed automatically. This is useful for versioning your models.

Once you’ve run the deploy function, give it a minute, and it should be deployed - easy-peasy! If your deployment is successful, you will see the following output:

✅ Authenticated with Cerebrium!
⬆️  Uploading conduit artifacts...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 179k/179k [00:04<00:00, 42.7kB/s]
✅ Conduit artifacts uploaded successfully.
✅ Conduit deployed!
🌍 Endpoint: https://run.cerebrium.ai/v1/YOUR-PROJECT-ID/YOUR-MODEL-NAME/predict

Our deploy function will also return the endpoint of your model directly. This is the URL that you will use to call your model in the future.

Run model locally

In order to run your model locally and ensure it is working as intended before deploying, you can use the load and run command. The load command will load your pipeline into memory using the paths you specified in deploy from a base path, while run will sequentially execute your loaded pipeline.

from cerebrium import deploy, model_type

conduit = deploy(('<MODEL_TYPE>', '<MODEL_FILE>'), '<MODEL_NAME>', '<API_KEY>')

Where data is the data you would send to your model. This would usually be some numerical 2D/3D array for typical models, or a list of strings for a language model. You may feed an ndarray or Tensor directly into this function. However, if you are using a custom data pipeline that is expecting another type, you may need to convert your data into the appropriate format for your model.

You can also define a Conduit object directly by using the Conduit class. Then call the run method on the Conduit object to test the model locally, or the deploy method to deploy the Conduit’s model flow to Cerebrium. When you use the Conduit object directly, you can specify what hardware you wish your model to run on by using the hardware parameter in the Conduit constructor. The hardware parameter is a enum that can be one of the following:

  • hardware.CPU: This will run your model on a CPU. This is the default option for SKLearn, XGBoost, and SpaCy models.
  • hardware.GPU: This will run your model on a T4 GPU. This is the default option for Torch, ONNX, and HuggingFace models.
  • hardware.A10: This will run your model on an A10 GPU, which provides 16GB of VRAM. You should use this option if you are using a model that is too large to fit on the 12GB of VRAM that a T4 GPU provides. This will include most large HuggingFace models.
from cerebrium import Conduit, model_type, hardware
conduit = Conduit(
  [('<MODEL_TYPE>', '<MODEL_FILE>')],

Additionally, defining a conduit object directly allows you to add more models to your flow dynamically using add_model method, as well as including monitoring functionality with Censius or Arize using add_logger.

conduit.add_model('<MODEL_TYPE>', '<MODEL_FILE>', {<PROCESSING_FUNCTIONS>})

API Specification and Helper methods

You can see an example of the request and response objects for calls made to your models. It should resemble what it is like calling your model locally in your own python environment.

Request Parameters

  curl --location --request POST '<ENDPOINT>' \
      --header 'Authorization: <API_KEY>' \
      --header 'Content-Type: application/json' \
      --data-raw '[<DATA_INPUT>]'

This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.


The content type of your request. Must be application/json or multipart/form-data if sending files.


A list of data points you would like to send to your model. e.g. for 1 data point of 3 features: [[1,2,3]].

  "result": [<MODEL_PREDICTION>],
  "run_id": "<RUN_ID>",
  "run_time_ms": <RUN_TIME_MS>
  "prediction_ids": ["<PREDICTION_ID>"]

Response Parameters


The result of your model prediction.


The run ID associated with your model predictions.


The amount of time if took your model to run down to the millisecond. This is what we charge you based on.


The prediction IDs associated with each of your model predictions. Used to track your model predictions with monitoring tools.

You can test out your model endpoint quickly with our utility function supplied in Cerebrium, model_api_request.

from cerebrium import model_api_request
model_api_request(endpoint, data, '<API_KEY>')

The function takes in the following parameters:

  • endpoint: The endpoint of your model that was returned by the deploy function.
  • data: The data you would like to send to your model. You may feed an ndarray or Tensor directly into this function.
  • api_key: This is the Cerebrium API key used to authenticate your request.

If you want to see concrete examples of various model deployments head on over to the Quickstarts section.