Deploy Mistral 7B with vLLM
pip install --upgrade cerebrium
to upgrade it to the latest version.[cerebrium.dependencies.pip]
section in your cerebrium.toml
file:
main.py
file for our Python code. This simple implementation can be done in a single file. First, let’s define our request object:
prompt
parameter is required, while others are optional with default values. If prompt
is missing from the request, users receive an automatic error message.
predict
function since it only needs to be loaded once at startup, not with every request. The predict
function simply passes input parameters from the request to the model and returns the generated outputs.
cerebrium.toml
: