Whisper
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.
The Whisper model can process audio of any length and takes about 7-10 minutes to process 1 hour of audio. Therefore, this example also demonstrates how you can supply a webhook endpoint that we will send a POST request to, containing your run_id
and results, once the model has finished running.
We currently have the following whisper model available below however if you would like any others contact support, and we can quickly add it for you.
In order to deploy it you can use the identifier below:
- Whisper Large:
whisper-large
Once you’ve deployed a Whisper model, you can supply the endpoint with a base-64 encoded audio file or a file URL. For large files, we recommend you use the file url and webhook endpoint together we will notify your endpoint with the results when the model has finished processing the audio. Here’s an example of how to call the deployed endpoint:
Request Parameters
curl --location --request POST 'https://run.cerebrium.ai/whisper-large-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"audio": "<BASE_64_STRING>"
"webhook_endpoint": "<WEBHOOK_ENDPOINT>",
"file_url": "<FILE_URL>"
}'
This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.
A base64 encoded string of the audio file you would like to transcribe/translate.
The default here is auto-select however if you would like you can specify the language as English (en), French (fr) etc
A http endpoint that we can notify with the results of the model once it has finished processing.
A publicly accessible url for us to fetch the file for transcription. This does not have to be a S3 url but can be any url.
{
"text": "<TEXT>",
"segments": []
}
Response Parameters
The text that has been transcribed or translated
Detailed information about the transcription/translation.
curl --location --request POST 'https://run.cerebrium.ai/whisper-large-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"audio": "<BASE_64_STRING>"
"webhook_endpoint": "<WEBHOOK_ENDPOINT>",
"file_url": "<FILE_URL>"
}'
{
"text": "<TEXT>",
"segments": []
}