By the end of this guide you’ll have an API endpoint that can handle any scale of traffic by running inference on serverless CPU’s/GPU’s.
Project set up
Before building you need to set up a Cerebrium account. This is as simple as starting a new Project in Cerebrium and copying the API key. This will be used to authenticate all calls for this project.
Create a project
- Go to dashboard.cerebrium.ai
- Signup or Login
- Navigate to Settings page
- Navigate to the API Keys tab
- You should see an API key with the source “Cerebrium”. Click the eye icon to display it. It will be in the format: c_api_key-xxx
Now navigate to where your model code is stored. This could be in a notebook or
in a plain
To start you should install the Cerebrium framework running the following command in your notebook or terminal
pip install cerebrium
Copy and paste our code below. This creates a simple Convolutional Neural Network. This code could be replaced by any Pytorch model. Make sure you have the required libraries installed.
from sklearn.datasets import load_iris from xgboost import XGBClassifier iris = load_iris() X, y = iris.data, iris.target xgb = XGBClassifier() xgb.fit(X, y) # Save to XGB JSON xgb.save_model("iris.json")
In the last line of code, there are two ways you can save the model. This is all
you need to deploy your model to Cerebrium! You can then import the
function from the Cerebrium framework. I used the CloudPickle function to save
my model below.
from cerebrium import deploy, model_type endpoint = deploy((model_type.XGBOOST_CLASSIFIER, "iris.json"), "xgb-test-model" , "<API_KEY>")
Your model is now deployed and ready for inference all in under 10 seconds! Navigate to the dashboard and on the Models page you will see your model.
You can run inference using
curl --location --request POST '<ENDPOINT>' \ --header 'Authorization: <API_KEY>' \ --header 'Content-Type: text/plain' \ --data-raw '[[5.1, 3.5, 1.4, 0.2]]'
The response will be:
Navigate back to the dashboard and click on the name of the model you just deployed. You will see an API call was made and the inference time. From your dashboard you can monitor your model, roll back to previous versions and see traffic.
With one line of code, your model was deployed in seconds with automatic versioning, monitoring and the ability to scale based on traffic spikes. Try deploying your own model now or check out our other frameworks.