Language Models
Whisper

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. The Whisper model can process audio of any length and takes about 7-10 minutes to process 1 hour of audio. Therefore, this example also demonstrates how you can supply a webhook endpoint that we will send a POST request to, containing your run_id and results, once the model has finished running.

We currently have the following whisper model available below however if you would like any others contact support, and we can quickly add it for you.
In order to deploy it you can use the identifier below:

  • Whisper Large: whisper-large

Once you’ve deployed a Whisper model, you can supply the endpoint with a base-64 encoded audio file or a file URL. For large files, we recommend you use the file url and webhook endpoint together we will notify your endpoint with the results when the model has finished processing the audio. Here’s an example of how to call the deployed endpoint:

Request Parameters

  curl --location --request POST 'https://run.cerebrium.ai/whisper-large-webhook/predict' \
      --header 'Authorization: <API_KEY>' \
      --header 'Content-Type: application/json' \
      --data-raw '{
        "audio": "<BASE_64_STRING>"
        "webhook_endpoint": "<WEBHOOK_ENDPOINT>",
        "file_url": "<FILE_URL>"
        }'
Authorizationrequired
string

This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.

audiooptional
string

A base64 encoded string of the audio file you would like to transcribe/translate.

languageoptional
string

The default here is auto-select however if you would like you can specify the language as English (en), French (fr) etc

webhook_endpointoptional
string

A http endpoint that we can notify with the results of the model once it has finished processing.

file_urlrequired
string

A publicly accessible url for us to fetch the file for transcription. This does not have to be a S3 url but can be any url.

{
  "text": "<TEXT>",
  "segments": []
}

Response Parameters

textrequired
string

The text that has been transcribed or translated

segmentsrequired
string

Detailed information about the transcription/translation.