Deploy a real-time AI voice agent
In this tutorial, we’ll create a real-time voice agent that responds to queries via speech in ~500ms. This flexible implementation lets you swap in any Large Language Model (LLM) or Text-to-Speech (TTS) model. It’s ideal for voice-based use cases like customer support bots and receptionists.
To create this app, we use PipeCat, a framework that handles component integration, user interruptions, and audio data processing. We’ll demonstrate this by joining a meeting room with our voice agent using Daily (PipeCat’s creators) and deploy the app on Cerebrium for seamless deployment and scaling.
Essentially our application will have 3/4 parts:
The reason we achieve such low latency is that each service is hosted within Cerebrium and so we have no network latency for the requests we make - communication across containers is less than 10ms.
You can find the final version of the code here
If you don’t have a Cerebrium account, you can create one by signing up here and following the documentation here to get set up.
For the sake of conciseness, look at our Partner Services page to see how you can deploy a Deepgram service on Cerebrium. The link is here
You need a Deepgram Enterprise License to do deploy Deegram on Cerebrium else you must use their API endpoint below.
For our LLM we deploy a OpenAI compatible Llama-3 endpoint using the vLLM framework - in order to have a low TTFT we deploy a quantized version (RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8”).
Run cerebrium init llama-llm
and add the following code to your cerebrium.toml:
Add the following code to your main.py - this uses the vLLM framework and makes it openAI compatible:
Make sure to add your HuggingFace token to your Secrets on Cerebrium as HF_TOKEN
.
The run cerebrium deploy
to make it live - you should see it live in your Cerebrium dashboard. We will use your deployment url in the next step.
Based on your GPU hardware and replica-concurrency in your cerebrium.toml, you can set how many concurrent calls the LLM can take.
In your IDE, run the following command to create our pipecat-agent: cerebrium init pipecat-agent
. We will be using the Pipecat framework to orchestrate our services to create a voice agent
Add the following pip packages to your cerebrium.toml
to create your deployment environment:
You can then add the following code to your main.py:
There is a lot happening above but below we will give a summary:
The Daily Python SDK provides event webhooks to trigger functionality based on events like users joining or leaving calls. Add this event handling code to the main()
function:
This code handles these events:
Based on your CPU hardware and replica-concurrency in your cerebrium.toml, you can set how many concurrent calls this Pipecat agent can take.
Create a .env file within your pipecat-agent folder with the following set:
Get your Daily developer token from your profile. If you don’t have an account, sign up here (they offer a generous free tier). Navigate to the “developers” tab to get your API key and add it to your Cerebrium Secrets.
To test your voice bot locally, you uncomment that main code at the bottom and then run python main.py
. Your code should then work
That’s it! You now have a fully functioning AI bot that interacts with users through speech in ~500ms. Imagine the possibilities!
Now, let’s create a user interface for the bot.
Deploy the app to Cerebrium by running this command in your terminal: cerebrium deploy
We’ll add these endpoints to our frontend interface.
We created a public fork of the PipeCat frontend to show you a nice demo of this application. You can clone the repo here.
Follow the instructions in the README.md and then populate the following variables in your .env.development.local
You can now run yarn dev and go to the URL: http://localhost:5173/ to test your application!
This tutorial provides a foundation for implementing voice in your app and extending into image and vision capabilities. PipeCat is an extensible, open-source framework for building voice-enabled apps, while Cerebrium provides seamless deployment and autoscaling with pay-as-you-go compute.
Tag us as @cerebriumai to showcase your work and join our Slack or Discord communities for questions and feedback.
Deploy a real-time AI voice agent
In this tutorial, we’ll create a real-time voice agent that responds to queries via speech in ~500ms. This flexible implementation lets you swap in any Large Language Model (LLM) or Text-to-Speech (TTS) model. It’s ideal for voice-based use cases like customer support bots and receptionists.
To create this app, we use PipeCat, a framework that handles component integration, user interruptions, and audio data processing. We’ll demonstrate this by joining a meeting room with our voice agent using Daily (PipeCat’s creators) and deploy the app on Cerebrium for seamless deployment and scaling.
Essentially our application will have 3/4 parts:
The reason we achieve such low latency is that each service is hosted within Cerebrium and so we have no network latency for the requests we make - communication across containers is less than 10ms.
You can find the final version of the code here
If you don’t have a Cerebrium account, you can create one by signing up here and following the documentation here to get set up.
For the sake of conciseness, look at our Partner Services page to see how you can deploy a Deepgram service on Cerebrium. The link is here
You need a Deepgram Enterprise License to do deploy Deegram on Cerebrium else you must use their API endpoint below.
For our LLM we deploy a OpenAI compatible Llama-3 endpoint using the vLLM framework - in order to have a low TTFT we deploy a quantized version (RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8”).
Run cerebrium init llama-llm
and add the following code to your cerebrium.toml:
Add the following code to your main.py - this uses the vLLM framework and makes it openAI compatible:
Make sure to add your HuggingFace token to your Secrets on Cerebrium as HF_TOKEN
.
The run cerebrium deploy
to make it live - you should see it live in your Cerebrium dashboard. We will use your deployment url in the next step.
Based on your GPU hardware and replica-concurrency in your cerebrium.toml, you can set how many concurrent calls the LLM can take.
In your IDE, run the following command to create our pipecat-agent: cerebrium init pipecat-agent
. We will be using the Pipecat framework to orchestrate our services to create a voice agent
Add the following pip packages to your cerebrium.toml
to create your deployment environment:
You can then add the following code to your main.py:
There is a lot happening above but below we will give a summary:
The Daily Python SDK provides event webhooks to trigger functionality based on events like users joining or leaving calls. Add this event handling code to the main()
function:
This code handles these events:
Based on your CPU hardware and replica-concurrency in your cerebrium.toml, you can set how many concurrent calls this Pipecat agent can take.
Create a .env file within your pipecat-agent folder with the following set:
Get your Daily developer token from your profile. If you don’t have an account, sign up here (they offer a generous free tier). Navigate to the “developers” tab to get your API key and add it to your Cerebrium Secrets.
To test your voice bot locally, you uncomment that main code at the bottom and then run python main.py
. Your code should then work
That’s it! You now have a fully functioning AI bot that interacts with users through speech in ~500ms. Imagine the possibilities!
Now, let’s create a user interface for the bot.
Deploy the app to Cerebrium by running this command in your terminal: cerebrium deploy
We’ll add these endpoints to our frontend interface.
We created a public fork of the PipeCat frontend to show you a nice demo of this application. You can clone the repo here.
Follow the instructions in the README.md and then populate the following variables in your .env.development.local
You can now run yarn dev and go to the URL: http://localhost:5173/ to test your application!
This tutorial provides a foundation for implementing voice in your app and extending into image and vision capabilities. PipeCat is an extensible, open-source framework for building voice-enabled apps, while Cerebrium provides seamless deployment and autoscaling with pay-as-you-go compute.
Tag us as @cerebriumai to showcase your work and join our Slack or Discord communities for questions and feedback.