Persistent Storage for Caching
Cerebrium caches your models in a specific directory on the server. This cache
persists across your project, meaning that if you deploy a new model with the same
name or use the same HuggingFace model, it will be loaded from the cache rather than downloaded from the cloud. This allows
us to scale your model deployments to handle more requests quickly, as well as reducing the
size of your deployment container images. Currently, the cache can be accessed through /persistent-storage
in your container instance,
should you wish to access it directly and store other artefacts. While you have full access to this drive, we recommend that you only
store files in directories other than /persistent-storage/cache
, as this and its subdirectories are used by Cerebrium to store your models.
In the future we will charge per GB of persistent storage used, but for now it is free while we are in active development. We will also be implementing a way to clear the cache in the future, as well as pre-uploading models to the cache before deployment.
As a simple example, suppose you have an external SAM model that you want to use in your custom deployment. You can download it to the cache as such:
import os
import torch
file_path = "/persistent-storage/segment-anything/model.pt"
# Check if the file already exists, if not download it
if not os.path.exists("/persistent-storage/segment-anything/"):
response = requests.get("https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth")
with open(file_path, "wb") as f:
f.write(response.content)
# Load the model
model = torch.jit.load(file_path)
... # Continue with your initialization
Now, in subsequent deployments the model will load from the cache rather than downloading it again.