Basic Setup
Developing models with Cerebrium is similar to developing on a virtual machine or Google Colab, making conversion straightforward. Make sure you have the Cerebrium package installed and are logged in. If not, check our docs here. First, create your project:[cerebrium.dependencies.pip]
section of your cerebrium.toml
file:
util.py
file for our utility functions - downloading a file from a URL or converting a base64 string to a file:
main.py
with our main Python code. Users can send either a base64-encoded string or a public URL of the audio file. We’ll pass this file to our model and return the output. First, let’s define our request object:
audio
and file_url
are optional parameters, we ensure at least one is provided. The webhook_endpoint
parameter, automatically included by Cerebrium in every request, is useful for long-running requests.
Note: Cerebrium has a 3-minute timeout for each inference request. For long audio files (2+ hours) that take several minutes to process, use a webhook_endpoint
- a URL where we’ll send a POST request with your function’s results.
Setup Model and inference
Below, we import the required packages and load our Whisper model. While the model downloads during initial deployment, it’s automatically cached in persistent storage for subsequent use. We load the model outside ourpredict
function since this code should only run on cold start (startup). For warm containers, only the predict
function executes for inference.
predict
function, which runs only on inference requests, creates an audio file from either the download URL or base64 string, transcribes it, and returns the output.
Deploy
Configure your compute and environment settings incerebrium.toml
:
run_id
- a unique identifier to correlate the result with the initial workload.
The endpoint returns results in this format: