Rime Partner Service is available from CLI version 1.39.0 and greater
Cerebrium’s partnership with Rime enables text-to-speech (TTS) deployment with low latency and region selection for data privacy compliance.
Setup
-
Create a Rime account and get an API key. Add the key as a secret in Cerebrium with the name “RIME_API_KEY”.
-
Create a Cerebrium app with the CLI:
- Rime services use a simplified TOML configuration with the
[cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:
[cerebrium.deployment]
name = "rime"
disable_auth = true
[cerebrium.runtime.rime]
port = 8001
# model_name = "arcana" # Optional: specify a Rime model (e.g. "arcana", "mist", "mistv2")
# language = "en" # Optional: specify language code (e.g. "en", "es")
[cerebrium.hardware]
cpu = 4
memory = 30
compute = "AMPERE_A10"
gpu_count = 1
region = "us-east-1"
[cerebrium.scaling]
min_replicas = 1
max_replicas = 2
cooldown = 120
replica_concurrency = 50
Disable auth because the Rime API key in the header handles authentication.
The Rime Server validates the API key directly.
- Run
cerebrium deploy to deploy the Rime service - the output of which should appear as follows:
App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-rime
- Send requests to the HTTP Rime service using the deployment URL from the output:
curl --location 'https://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxxxx/rime' \
--header 'Authorization: Bearer <RIME_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: audio/pcm' \
--data '{
"text": "I would love to have a conversation with you.",
"speaker": "joy",
"modelId": "mist"
}'
For Websockets, send the following
wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/rime/ws2?audioFormat=mp3&speaker=cove&modelId=mistv2&phonemizeBetweenBrackets=true
Authorization Bearer <RIME_API_KEY>
#With a message like:
{"text": "This "},
{"text": "is "},
{"text": "a "},
{"text": "test against the "},
{"text": "websockets endpoint of the "},
{"text": "api image. "},
{"operation": "flush"},
{"text": "This "},
{"text": "is "},
{"text": "an "},
{"text": "incomplete "},
{"text": "phrase "},
{"operation": "eos"}
Runtime Configuration
The [cerebrium.runtime.rime] section supports the following parameters:
| Option | Type | Default | Description |
|---|
port | integer | required | Port the Rime server listens on. Typically 8001. |
model_name | string | — | Rime model to load (e.g. "arcana", "mist", "mistv2"). Defaults to Rime’s server default if not set. |
language | string | — | Language code for the model (e.g. "en", "es"). Defaults to Rime’s server default if not set. |
Example with optional parameters:
[cerebrium.runtime.rime]
port = 8001
model_name = "arcana"
language = "en"
Scaling and Concurrency
Rime services support independent scaling configurations:
- min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
- max_replicas: Maximum instances during high load.
- replica_concurrency: Concurrent requests per instance. Recommended: 3.
- cooldown: Time window (in seconds) that must pass at reduced concurrency before scaling down. Recommended: 50.
- compute: Instance type. Recommended:
AMPERE_A10.
Adjust these parameters based on traffic patterns and latency requirements. Consult the Rime team
for concurrency and scalability guidance.
For further documentation on Rime, see the Rime documentation.