OpenAI-Compatible Endpoints
By default, all functions deployed on Cerebrium are a REST API that are accessible through an authenticated POST request. We have made all these endpoints OpenAI-compatible, whether they use /chat/completions
or /embedding
. Below we show you a very basic implementation of a streaming OpenAI-compatible endpoint.
We recommend you check out a full example of how to deploy an OpenAI-compatible endpoint using vLLM here.
To create a streaming-compatible endpoint, we need to make sure our Cerebrium function:
- Specifies all the parameters that OpenAI sends in the function signature.
- Returns
yield data
, whereyield
signifies we are streaming anddata
is the JSON-serializable object that we are returning to our user.
Here’s a small snippet from the example listed above:
Once deployed, we can set the base URL to the desired function we wish to call and use our Cerebrium JWT (accessible on your dashboard) as the API key.
Our client code will then look something like this:
The output then looks like this: