Stream outputs live from Falcon 7B using SSE
pip install --upgrade cerebrium
to upgrade it to the latest version.[cerebrium.dependencies.pip]
section of your cerebrium.toml
file:
main.py
file for our Python code. This simple implementation can be done in a single file. First, let’s define our request object:
prompt
parameter is required, while others are optional with default values. If prompt
is missing from the request, users receive an automatic error message.
predict
function. This ensures model weights load only once at startup, not with every request.
stream
function to handle streaming results from our endpoint:
TextIteratorStreamer
to stream model output. The yield
keyword returns output as it’s generated.
cerebrium.toml
:
stream
since that’s our function name.