Using FastAPI, Gradio and Cerebrium to deploy an LLM chat interface
pip install --upgrade cerebrium
.cerebrium.toml
file:
/health
for checking app availability through our FastAPI application.main.py
). To start, let’s create our FastAPI application:
/health
.main.py
, let’s add the following code:
GradioServer
that handles the communication with the Llama model endpointchat_with_llama
method that sends a message to the Llama model and returns the responserun_server
method that creates a Gradio chat interfacestart
method that starts the Gradio server in a separate processstop
method that stops the Gradio serveron_event
startup and shutdown event that starts and stops the Gradio server respectivelymain.py
file should look like this: