Create a OpenAI compatible endpoint using the vLLM framework
cerebrium init 1-openai-compatible-endpoint
. This creates two files:
main.py
: Our entrypoint file where our code livescerebrium.toml
: A configuration file that contains all our build and environment settingscerebrium.toml
to create your deployment environment:
main.py
:
run_id
for each requeststream=True
using async functionality/run
). While OpenAI-compatible endpoints typically end with /chat/completions
, we’ve made all endpoints OpenAI-compatible. Here’s how to call the endpoint:
/run
). Use your JWT token from either the curl command or your Cerebrium dashboard’s API Keys section.
Voilà! You now have an OpenAI-compatible endpoint that you can customize to your needs!