Rime Partner Service is available from CLI version 1.39.0 and greater

Cerebrium’s partnership with Rime helps teams deliver text-to-speech (TTS) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs.

Setup

  1. Create a Rime account and get an API key. In order to use Rime on Cerebrium, you will need to create a Rime account and get an API key. You must then create a secret in Cerebrium with the specific name “RIME_API_KEY”.

  2. Create a simple cerebrium app with the CLI:

cerebrium init rime
  1. Rime services use a simplified TOML configuration with the [cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:
[cerebrium.deployment]
name = "rime"
disable_auth = true

[cerebrium.runtime.rime]
port = 8001

[cerebrium.hardware]
cpu = 4
memory = 30
compute = "AMPERE_A10"
gpu_count = 1
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 1
max_replicas = 2
cooldown = 120
replica_concurrency = 50

You need to disable auth in the above since you need to use your Rime API key in the header. API authentication is handle by the Rime Server using your API key

  1. Run cerebrium deploy to deploy the Rime service - the output of which should appear as follows:
App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-rime
  1. Use the Deployment url from the output to send requests to the HTTP Rime service via curl request:
curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxxxx/rime' \
--header 'Authorization: Bearer <RIME_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: audio/pcm' \
--data '{
  "text": "I would love to have a conversation with you.",
  "speaker": "joy",
  "modelId": "mist"
}'

For Websockets, send the following

wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/rime/ws2?audioFormat=mp3&speaker=cove&modelId=mistv2&phonemizeBetweenBrackets=true
Authorization Bearer <RIME_API_KEY>

#With a message like:
{"text": "This "},
{"text": "is "},
{"text": "a "},
{"text": "test against the "},
{"text": "websockets endpoint of the "},
{"text": "api image. "},
{"operation": "flush"},
{"text": "This "},
{"text": "is "},
{"text": "an "},
{"text": "incomplete "},
{"text": "phrase "},
{"operation": "eos"}

Scaling and Concurrency

Rime services support independent scaling configurations:

  • min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
  • max_replicas: Maximum instances during high load.
  • replica_concurrency: Concurrent requests per instance. Recommended: 3.
  • cooldown: Seconds an instance remains active after last request. Recommended: 50.
  • compute: Instance type. Recommended: AMPERE_A10.

Adjust these parameters based on traffic patterns and latency requirements. Best would be to consult the Rime team about concurrency and scalability

For further documentation on Rime, see the Rime documentation.