Deepgram
Deploy Deepgram speech-to-text services on Cerebrium
Cerebrium’s partnership with Deepgram enables simple deployment of speech-to-text (STT) services with simplified configuration and independent scaling.
Using Deepgram services requires an Enterprise Deepgram account and API key for self-hosted models. Contact Deepgram support to access this feature.
Links to the Deepgram model files referenced below (file extension .dg
) should should be obtained from the Deepgram Account Representative.
For the purposes of this example, the nova-2-general
model will be used.
Deepgram Partner Service is available from CLI version 1.39.0 and greater
Setup
- Create a simple cerebrium app with the CLI:
-
Create a self-hosted API key from the Deepgram dashboard. Navigate to the Secrets tab in the Cerebrium dashboard and add the API key with the name
DEEPGRAM_API_KEY
. This secret automatically becomes available as an environment variable in the deployment. -
Download model files from Deepgram’s self-hosted section in the Deepgram dashboard using the guide available, here. Select the ‘license proxy’ deployment type. Upload downloaded model files using the links provided by your Account Representative (with
.dg
extension) to persistent-storage in the/deepgram-models
folder. This folder automatically attaches to the engine container. Use this command to upload the files:
- Update the cerebrium.toml file with the following configuration to set hardware requirements, scaling parameters, region, and other settings:
-
Run ‘cerebrium deploy’ to deploy the app. After deployment and endpoint for the Deepgram services is provided in the terminal output (The URL for this endpoint can also be found in the App’s overview page on the dashboard).
-
Download an example audio file for use with the deepgram service:
- Access the Deepgram service by calling the endpoint with appropriate parameters such as:
Parameters accepted by the Deepgram service can be found in the speech-to-text API reference.
API Key Configuration
To use Deepgram services:
- Sign up at deepgram.com
- Create an API key in the Deepgram dashboard
- Add the API key to Cerebrium:
- Navigate to Secrets tab in the Cerebrium dashboard
- Add the Deepgram API key as an app-specific or project-wide secret named
DEEPGRAM_API_KEY
- This secret automatically becomes available as an environment variable in the deployment
Scaling and Concurrency
Deepgram services support independent scaling configurations:
- min_replicas: Minimum number of instances to maintain (0 for scale-to-zero)
- max_replicas: Maximum number of instances that can be created during high load
- replica_concurrency: Number of concurrent requests each instance can handle
- cooldown: Time in seconds that an instance remains active after processing its last request
Adjust these parameters based on expected traffic patterns and latency requirements.
Usage Examples
Cerebrium runs both Deepgram STT models and applications on the same network alongside LiveKit workers, reducing latency by approximately 400ms—a significant advantage for voice agent applications.
For a complete implementation reference, see the LiveKit Outbound Agent example.