OpenAI Compatible Endpoints
By default, all functions deployed on Cerebrium are a REST API that are accessible through an authenticated POST request. We have made all these endpoints OpenAI compatible whether it be /chat/completions or /embedding. Below we show you a very basic implementation of implementing a streaming OpenAI compatible endpoint.
We recommend you checkout a full example of how to deploy a OpenAI compatible endpoint using vLLM here
To create a streaming compatible endpoint, we need to make sure our cerebrium function:
- Specifies all the parameters that OpenAI sends in the function signature
- We return
yield data
. Where yield signifies we are streaming and data is the json serializable object which we are returning to our user.
Here’s a small snippet from the example listed above:
Once deployed, we can set the base URL to the desired function we wish to call and use our Cerebrium JWT (accessible on your dashboard) as the API key.
Our client code will then look something like this:
The output then looks like this