Twilio Voice Agent with PipeCat
Integrate a real-time AI voice agent with Twilio
This tutorial demonstrates creating a real-time voice agent that responds to phone calls via Twilio. The flexible implementation supports any LLM or Text-to-Speech (TTS) model, making it ideal for voice applications like customer support bots and receptionists.
We’ll use PipeCat to handle component integration, user interruptions, and audio data processing.
You can find the final version of the code here
Cerebrium setup
Set up Cerebrium:
- Sign up here
- Follow installation docs here
- Create a starter project:
This creates:
main.py
: Entrypoint filecerebrium.toml
: Build and environment configuration
Add these pip packages to your cerebrium.toml
:
Set up a FastAPI server to handle Twilio calls and upgrade to WebSocket connections for real-time communication. Add this code to main.py
:
Create a templates
folder with stream.xml
inside. This XML response tells Twilio to upgrade to a WebSocket connection:
Replace the stream URL with your deployment’s base endpoint, using your project ID. Next, let’s set up your Twilio number.
Configure Cerebrium to run the FastAPI server by adding this to cerebrium.toml
:
You can read more about run custom web servers here.
Twilio setup
Twilio provides cloud communications APIs for messaging, voice, video, and authentication. While we use Twilio for this demo, other providers work too. Sign up for a free account here.
Purchase a local number (not toll-free) from the phone numbers page. Then set up a webhook to connect calls to your agent.
You should then save the changes above and move on to setting up our AI Agent.
AI Agent Setup
Create bot.py
to set up the AI agent using PipeCat for component integration, interruption handling, and audio processing:
The code:
- Connects to WebSocket transport for audio I/O
- Sets up services:
- Uses Secrets for authentication
- Creates a customizable PipelineTask supporting:
- Image and Vision use cases (docs)
- Built-in interruption handling
- Easy model swapping
- Handles call events (join/leave) via webhooks
For lower latency (~500ms end-to-end), run parts or all of the pipeline locally. Learn more in our voice agents guide and RAG voice agent blog post.
Deploy to Cerebrium
To deploy this app to Cerebrium you can simply run the command: cerebrium deploy in your terminal.
If it deployed successfully, you should see something like this:
Test the app by calling your Twilio number - the agent will respond automatically.
Scaling Pipecat
For scaling PipeCat on Cerebrium:
- Use large CPU instances (10 CPUs, 8GB memory) for Twilio’s less than 1s response requirement
- Run concurrent PipeCat processes:
- Each process uses ~0.5 CPUs
- 10 CPU instance handles 20 concurrent calls
- Adjust based on traffic needs
For scaling criteria, use Cerebrium’s replica_concurrency
setting to spawn new containers based on utilization, preventing cold starts for subsequent calls.
To make the two updates above you can update your cerebrium.toml to contain the following:
Conclusion
This tutorial provides a foundation for implementing voice features and expanding into image and vision capabilities. PipeCat offers an extensible, open-source framework for building voice-enabled apps, while Cerebrium provides seamless deployment and autoscaling with pay-as-you-go compute.
Tag us as @cerebriumai to showcase your work and join our Slack or Discord communities for questions and feedback.