Tortoise
Tortoise is a text-to-speech model that prioritizes strong multi-voice capabilities highly realistic prosody and intonation. It is not the fastest model since it runs an autoregressive decoder and a diffusion decoder. Typically, it takes a minute inference for 2 sentences, so if you are doing longer sentences we recommend you use the webhook functionality. You can read more from the GitHub repo here.
- Tortoise:
tortoise
Once you’ve deployed a Tortoise model, you can simply supply a prompt and a random voice will be used to generate the response.:
Request Parameters
curl --location --request POST 'https://run.cerebrium.ai/tortise-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"prompt": "Is this what my voice sounds like"
"webhook_endpoint": "<WEBHOOK_ENDPOINT>",
"file_url": "<FILE_URL>"
}'
This is the Cerebrium API key used to authenticate your request. You can get it from your Cerebrium dashboard.
The prompt that you would like turned into a voice media file.
A http endpoint that we can notify with the results of the model once it has finished processing.
A publicly accessible url for us to fetch a custom voice to use. You need to specify ‘custom’ for the voice parameter.
A pre-loaded voice you wish to use for your text. By default a random voice will be collected.
How quickly would you like the text to be read. The default is fast.
{
"run_id": "cfd6d71d-c71d-48fa-b685-2bb",
"run_time_ms": 32305.,
"message": "Successfully generated media file",
"result": {
"audio": <BASE_64_STRING>
}
}
Response Parameters
A unique identifier for the run that you can use to associate prompts with webhook endpoints.
The amount of time in millisecond it took to run your function. This is what you will be billed for.
Whether of not the response was successful
A base64 encoded string of your media file
To convert you base64 encoded string to a media file you can run the following code snippet:
decoded_bytes = base64.decodebytes(audio.split(',')[1].encode("ascii"))
with open("output.mp3", "wb") as mp3file:
mp3_file.write(decoded_bytes)
curl --location --request POST 'https://run.cerebrium.ai/tortise-webhook/predict' \
--header 'Authorization: <API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
"prompt": "Is this what my voice sounds like"
"webhook_endpoint": "<WEBHOOK_ENDPOINT>",
"file_url": "<FILE_URL>"
}'
{
"run_id": "cfd6d71d-c71d-48fa-b685-2bb",
"run_time_ms": 32305.,
"message": "Successfully generated media file",
"result": {
"audio": <BASE_64_STRING>
}
}