Rime Partner Service is available from CLI version 1.39.0 and greater
Cerebrium’s partnership with Rime helps teams deliver text-to-speech (TTS) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs.
Create a Rime account and get an API key.
In order to use Rime on Cerebrium, you will need to create a Rime account and get an API key. You must then create a secret in Cerebrium with the specific name “RIME_API_KEY”.
Create a simple cerebrium app with the CLI:
Copy
Ask AI
cerebrium init rime
Rime services use a simplified TOML configuration with the [cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:
You need to disable auth in the above since you need to use your Rime API key
in the header. API authentication is handle by the Rime Server using your API
key
Run cerebrium deploy to deploy the Rime service - the output of which should appear as follows:
Use the Deployment url from the output to send requests to the HTTP Rime service via curl request:
Copy
Ask AI
curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxxxx/rime' \--header 'Authorization: Bearer <RIME_API_KEY>' \--header 'Content-Type: application/json' \--header 'Accept: audio/pcm' \--data '{ "text": "I would love to have a conversation with you.", "speaker": "joy", "modelId": "mist"}'
For Websockets, send the following
Copy
Ask AI
wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/rime/ws2?audioFormat=mp3&speaker=cove&modelId=mistv2&phonemizeBetweenBrackets=trueAuthorization Bearer <RIME_API_KEY>#With a message like:{"text": "This "},{"text": "is "},{"text": "a "},{"text": "test against the "},{"text": "websockets endpoint of the "},{"text": "api image. "},{"operation": "flush"},{"text": "This "},{"text": "is "},{"text": "an "},{"text": "incomplete "},{"text": "phrase "},{"operation": "eos"}
Rime services support independent scaling configurations:
min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
max_replicas: Maximum instances during high load.
replica_concurrency: Concurrent requests per instance. Recommended: 3.
cooldown: Seconds an instance remains active after last request. Recommended: 50.
compute: Instance type. Recommended: AMPERE_A10.
Adjust these parameters based on traffic patterns and latency requirements. Best would be to consult the Rime team
about concurrency and scalabilityFor further documentation on Rime, see the Rime documentation.