Test and Deploy
Start by importing our Python framework, Cerebrium, that allows us to abstract away the complexity of provisioning infrastructure, versioning and much more!
from cerebrium import deploy, model_type
Some model training logic...
endpoint = deploy(('<MODEL_TYPE>', '<MODEL_FILE>'), '<MODEL_NAME>', '<API_KEY>')
In the deploy function there are the following parameters:
- A tuple of the model type and the model file:
- MODEL_TYPE: This parameter specifies the type of model you are supplying
Cerebrium and must be a
model_type
. This is to ensure that Cerebrium knows how to handle your model. The current supported model types are:- model_type.SKLEARN: Expects a
.pkl
file (there is no requirement of the model to be a regressor or classifier). - model_type.SKLEARN_CLASSIFIER: Expects a
.pkl
file (the model must be a classifier. returns a class probability distribution instead of a single class prediction) - model_type.SKLEARN_PREPROCESSOR: Expects a
.pkl
file. This is a special model type that is used to preprocess data with the.transform
method before it is sent to the model, such as a scaler or a one-hot encoder. - model_type.TORCH: Expects a
.pkl
file serialized withcloudpickle
or a JIT script Torchscript.pt
file. - model_type.XGBOOST_REGRESSOR: Expects a serialized
.pkl
file or a XGB.json
file. - model_type.XGBOOST_CLASSIFIER: Expects a serialized
.pkl
file or a XGB.json
file. - model_type.ONNX: Expects a serialized
.onnx
file - model_type.SPACY: Expects a folder path of your Spacy model
- model_type.HUGGINGFACE_PIPELINE: Expects the task identifier and model identifier of the Hugging face model
- model_type.SKLEARN: Expects a
- MODEL_FILE: This is the string path to the model object you have obtained as a result of training locally or in the cloud. This is usually some model object or model file that you have exported.
- MODEL_TYPE: This parameter specifies the type of model you are supplying
Cerebrium and must be a
- MODEL_NAME: The name you would like to give your model (alphanumeric, with hyphens and less than 20 characters). This is a unique identifier for your model and will be used to call your model in the future.
- API_KEY: This is the API key that can be found on your profile. You can get it here.
Every unique model name will create a separate deployment with a separate endpoint. It is important to keep track of the model names you have used so that you can call the correct model in the future. If you deploy a model with the same name as a previous model, the previous model will be archived and the new model will be deployed automatically. This is useful for versioning your models.
Once you’ve run the deploy
function, give it a minute, and it should be
deployed - easy-peasy! If your deployment is successful, you will see the
following output:
✅ Authenticated with Cerebrium!
⬆️ Uploading conduit artifacts...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 179k/179k [00:04<00:00, 42.7kB/s]
✅ Conduit artifacts uploaded successfully.
✅ Conduit deployed!
🌍 Endpoint: https://run.cerebrium.ai/v1/YOUR-PROJECT-ID/YOUR-MODEL-NAME/predict
Our deploy
function will also return the endpoint of your model directly. This
is the URL that you will use to call your model in the future.
Run model locally
In order to run your model locally and ensure it is working as intended before
deploying, you can use the load
and run
command. The load
command will
load your pipeline into memory using the paths you specified in deploy
from a base path, while run
will sequentially execute your loaded pipeline.
from cerebrium import deploy, model_type
conduit = deploy(('<MODEL_TYPE>', '<MODEL_FILE>'), '<MODEL_NAME>', '<API_KEY>')
conduit.load('./')
conduit.run(data)
Where data
is the data you would send to your model. This would usually be
some numerical 2D/3D array for typical models, or a list of strings for a language model. You may
feed an ndarray
or Tensor
directly into this function. However, if you are
using a custom data pipeline that is expecting another type, you may need to convert your data into the
appropriate format for your model.
You can also define a Conduit object directly by using the Conduit
class. Then call the run
method on the Conduit object to test the model locally, or the deploy
method to deploy the Conduit’s model flow to Cerebrium.
When you use the Conduit object directly, you can specify what hardware you wish your model to run on by using the hardware
parameter in the Conduit
constructor.
The hardware
parameter is a enum that can be one of the following:
- hardware.CPU: This will run your model on a CPU. This is the default option for SKLearn, XGBoost, and SpaCy models.
- hardware.GPU: This will run your model on a T4 GPU. This is the default option for Torch, ONNX, and HuggingFace models.
- hardware.A10: This will run your model on an A10 GPU, which provides 16GB of VRAM. You should use this option if you are using a model that is too large to fit on the 12GB of VRAM that a T4 GPU provides. This will include most large HuggingFace models.
from cerebrium import Conduit, model_type, hardware
conduit = Conduit(
'<MODEL_NAME>',
'<API_KEY>',
[('<MODEL_TYPE>', '<MODEL_FILE>')],
hardware=hardware.<HARDWARE_TYPE>
)
conduit.load('./')
conduit.run(data)
conduit.deploy()
Additionally, defining a conduit object directly allows you to add more models to your flow dynamically using add_model
method, as well as
including monitoring functionality with Censius or Arize using add_logger
.
conduit.add_model('<MODEL_TYPE>', '<MODEL_FILE>', {<PROCESSING_FUNCTIONS>})
API Specification and Helper methods
You can see an example of the request and response objects for calls made to your models. It should resemble what it is like calling your model locally in your own python environment.
Request Parameters
curl --location --request POST '<ENDPOINT>' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '[<DATA_INPUT>]'
This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.
The content type of your request. Must be application/json or multipart/form-data if sending files.
A list of data points you would like to send to your model. e.g. for 1 data point of 3 features: [[1,2,3]].
{
"result": [<MODEL_PREDICTION>],
"run_id": "<RUN_ID>",
"run_time_ms": <RUN_TIME_MS>
"prediction_ids": ["<PREDICTION_ID>"]
}
Response Parameters
The result of your model prediction.
The run ID associated with your model predictions.
The amount of time if took your model to run down to the millisecond. This is what we charge you based on.
The prediction IDs associated with each of your model predictions. Used to track your model predictions with monitoring tools.
You can test out your model endpoint quickly with our utility function supplied
in Cerebrium, model_api_request
.
from cerebrium import model_api_request
model_api_request(endpoint, data, '<API_KEY>')
The function takes in the following parameters:
- endpoint: The endpoint of your model that was returned by the
deploy
function. - data: The data you would like to send to your model. You may
feed an
ndarray
orTensor
directly into this function. - api_key: This is the Cerebrium API key used to authenticate your request.
If you want to see concrete examples of various model deployments head on over to the Quickstarts section.
curl --location --request POST '<ENDPOINT>' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '[<DATA_INPUT>]'
{
"result": [<MODEL_PREDICTION>],
"run_id": "<RUN_ID>",
"run_time_ms": <RUN_TIME_MS>
"prediction_ids": ["<PREDICTION_ID>"]
}