Transcribe 1 hour podcast
Using Distill Whisper to transcribe an audio file
In this tutorial, we will transcribe an hour audio file using Distill Whisper - an optimized version of Whisper-large-v2 but 60% faster and within 1% of the error rate. We will accept either a base64 encode string of the audio file or an URL from which we can download the audio file from.
To see the final implementation, you can view it here
Basic Setup
It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy! Please make sure you have the Cerebrium package installed and have logged in. If not, please take a look at our docs here
First we create our project:
It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy!
Let us add the following packages to the [cerebrium.dependencies.pip] section of our cerebrium.toml
file:
To start let us create a util.py file for our utility functions - downloading a file from a URL or converting a base64 string to a file. Our util.py would look something like below:
Now that our utility functions are complete, go to the main.py file which will contain our main Python code. We would like the user to send us either a base64 encoded string of the file or a public url from which we can download the file. We would then pass this file to our model and return the output to the user. So let us define our request object.
Above, we use Pydantic as our data validation library. Due to the way that we have defined the Base Model, “audio” and “file_url” are optional parameters but we must do a check to make sure we are given the one or the other. The webhook_endpoint parameter is something Cerebrium automatically includes in every request and can be used for long running requests. Currently, Cerebrium has a max timeout of 3 minutes for each inference request. For long audio files (2 hours) which take a couple minutes to process it would be best to use a webhook_endpoint which is a url we will make a POST request to with the results of your function.
Setup Model and inference
Below, we import the required packages and load in our Whisper model. This will be downloaded during your deployment, however, in subsequent deploys or inference requests it will be automatically cached in your persistent storage for subsequent use. You can read more about persistent storage here We do this outside our predict function since we only want this code to run on a cold start (ie: on startup). If the container is already warm, we just want it to do inference and it will execute just the predict function.
In our predict function, which only runs on inference requests, we simply create a audio file from the download URL or string given to us via the request. We then transcribe the file and return the output to a user.
Deploy
Your cerebrium.toml file is where you can set your compute/environment. Your cerebrium.toml file should look like:
Deploy the app use the following command:
Once deployed, make the following request:
You will notice that you get an immediate response with a 202 status code and a run_id. This run_id is a unique identifier for you to be able to correlate the result to the initial workload.
Our endpoint will then get the following results: