Cerebrium setup
If you don’t have a Cerebrium account, you can create one by signing up here and following the documentation here to get set up. In your IDE, run the following command to create our Cerebrium starter project:cerebrium init 1-openai-compatible-endpoint
. This creates two files:
main.py
: Our entrypoint file where our code livescerebrium.toml
: A configuration file that contains all our build and environment settings
cerebrium.toml
to create your deployment environment:
main.py
:
- Takes parameters through its signature, with optional and default values available
- Automatically receives a unique
run_id
for each request - Processes the entire prompt through the model
- Streams results when
stream=True
using async functionality - Returns the complete result at the end if streaming is disabled
Deploy & Inference
To deploy the model use the following command:/run
). While OpenAI-compatible endpoints typically end with /chat/completions
, we’ve made all endpoints OpenAI-compatible. Here’s how to call the endpoint:
/run
). Use your JWT token from either the curl command or your Cerebrium dashboard’s API Keys section.
Voilà! You now have an OpenAI-compatible endpoint that you can customize to your needs!