Rime

Rime Partner Service is available from CLI version 1.39.0 and greater

Cerebrium’s partnership with Rime helps teams deliver text-to-speech (TTS) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs.

Setup

Create a Rime account and get an API key. In order to use Rime on Cerebrium, you will need to create a Rime account and get an API key. You must then create a secret in Cerebrium with the specific name “RIME_API_KEY”.
Create a simple cerebrium app with the CLI:

cerebrium init rime

Rime services use a simplified TOML configuration with the [cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:

[cerebrium.deployment]
name = "rime"
disable_auth = true

[cerebrium.runtime.rime]
port = 8001

[cerebrium.hardware]
cpu = 4
memory = 30
compute = "AMPERE_A10"
gpu_count = 1
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 1
max_replicas = 2
cooldown = 120
replica_concurrency = 50

You need to disable auth in the above since you need to use your Rime API key in the header. API authentication is handle by the Rime Server using your API key

Run cerebrium deploy to deploy the Rime service - the output of which should appear as follows:

App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-rime

Use the Deployment url from the output to send requests to the HTTP Rime service via curl request:

curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxxxx/rime' \
--header 'Authorization: Bearer <RIME_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: audio/pcm' \
--data '{
  "text": "I would love to have a conversation with you.",
  "speaker": "joy",
  "modelId": "mist"
}'

For Websockets, send the following

wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/rime/ws2?audioFormat=mp3&speaker=cove&modelId=mistv2&phonemizeBetweenBrackets=true
Authorization Bearer <RIME_API_KEY>

#With a message like:
{"text": "This "},
{"text": "is "},
{"text": "a "},
{"text": "test against the "},
{"text": "websockets endpoint of the "},
{"text": "api image. "},
{"operation": "flush"},
{"text": "This "},
{"text": "is "},
{"text": "an "},
{"text": "incomplete "},
{"text": "phrase "},
{"operation": "eos"}

Scaling and Concurrency

Rime services support independent scaling configurations:

min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
max_replicas: Maximum instances during high load.
replica_concurrency: Concurrent requests per instance. Recommended: 3.
cooldown: Seconds an instance remains active after last request. Recommended: 50.
compute: Instance type. Recommended: AMPERE_A10.

Adjust these parameters based on traffic patterns and latency requirements. Best would be to consult the Rime team about concurrency and scalability For further documentation on Rime, see the Rime documentation.

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

Setup

Scaling and Concurrency

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

​Setup

​Scaling and Concurrency

Setup

Scaling and Concurrency