Rime Partner Service is available from CLI version 1.39.0 and greater

Cerebrium’s partnership with Rime helps teams deliver text-to-speech (TTS) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs.

Setup

  1. Create a simple cerebrium app with the CLI:
cerebrium init rime
  1. Rime services use a simplified TOML configuration with the [cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:
[cerebrium.deployment]
name = "rime"

[cerebrium.runtime.rime]

[cerebrium.hardware]
cpu = 4
memory = 32
compute = "AMPERE_A10"
gpu_count = 1

[cerebrium.scaling]
min_replicas = 0
max_replicas = 2
cooldown = 120
replica_concurrency = 3
  1. Run cerebrium deploy to deploy the Rime service - the output of which should appear as follows:
App Dashboard: https://dev-dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-rime
  1. Use the Deployment url from the output to send requests to the Rime service via curl request:
curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxxxx/rime' \
--header 'Authorization: Bearer <RIME_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: audio/pcm' \
--data '{
  "text": "I would love to have a conversation with you.",
  "speaker": "joy",
  "modelId": "mist"
}'

The RIME_API_KEY is available in the Rime dashboard.

Scaling and Concurrency

Rime services support independent scaling configurations:

  • min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
  • max_replicas: Maximum instances during high load.
  • replica_concurrency: Concurrent requests per instance. Recommended: 3.
  • cooldown: Seconds an instance remains active after last request. Recommended: 120.
  • compute: Instance type. Recommended: AMPERE_A10.

Adjust these parameters based on traffic patterns and latency requirements.

For further documentation on Rime, see the Rime documentation.