Deploy a Model from Hugging Face on Cerebrium
Feature | Hugging Face | Cerebrium |
---|---|---|
Pricing | $0.000278 per second | $0.0004676 per second |
Minimum cooldown period | 15m | 1s |
First build timed | 9m25s | 49s |
Subsequent build times | 1m50s - 2m15s | 58s - 1m5s |
Response time (From cold) | 1m45s - 1m48s | 8s - 17s |
Response time (From warm) | 6s | 2s |
Co-locating your models | Requires a separate repository for each inference endpoint and mode | Co-locate multiple models from various sources in a single app |
Response handling (From cold) | Throws an error | Waits for infrastructure to become available and returns a response |
cerebrium init [PROJECT_NAME]
. During the initialization, a cerebrium.toml
is created. This file configures the deployment, hardware, scaling, and dependencies for your Cerebrium project. Update your cerebrium.toml
file to reflect the following:
cerebrium.deployment
: Specifies the project name, Python version, base Docker image, and which files to include/exclude as project files.cerebrium.hardware
: Defines the CPU, memory, and GPU requirements for your deployment.cerebrium.scaling
: Configures auto-scaling behavior, including minimum and maximum replicas, and cooldown period.cerebrium.dependencies.pip
: Lists the Python packages required for your project.main.py
file. This is where you’ll define your model loading and inference logic.
Item
class to structure and validate (using Pydantic) the input parameters.run
function that generates text based on the provided prompt and parameters.cerebrium.toml
to set up and deploy your model.
[CEREBRIUM_API_KEY]
with your Inference API key, which can be found in your dashboard under API keys. This code sends a POST request to your deployed model’s endpoint with a prompt, and prints the model’s response.
HF_AUTH_TOKEN
secret in Cerebrium for authenticating with Hugging Facecerebrium.toml
file specifies the hardware requirements. Adjust these based on your specific model and performance needscerebrium.toml
to ensure you’re using the latest compatible versions