Migrating from Replicate
Deploy a Model from Replicate on Cerebrium
Introduction
In this tutorial, I will show you how you can migrate your workloads from Replicate to Cerebrium in less than 5 minutes!
As an example, we will be migrating the model SDXL-Lightning-4step from ByteDance. You can find the link to it on replicate here:
It is best to look at the code in the GitHub repo and follow along as we migrate it.
To start, let us create our cerebrium project
Now Cerebrium and Replicate have a common setup in that they both have a setup file. cog.yaml and cerebrium.toml for Replicate and Cerebrium respectively.
Looking at the cog.yaml, we need to add/change the following in our cerebrium.toml
From the above we do the following:
- Since we need a GPU, we need to use one of the base images that come from Nvidia that has the CUDA libraries installed. We use the Cuda 12 image. You can see other images here
- Depending on the type of CPU/GPU you need, you can update the hardware settings to run your app. You can see the full list available here
- We copy across the pip packages we need to install
- Replicate uses pget to download model weights - we therefore need to download it to use it. We do this by installing curl and then adding the shell commands in our cerebrium.toml
Great now our setup is the same in terms of our hardware and environment.
Now the cog.yaml usually indicates the file that the endpoint calls - in this case, predict.py so let us inspect that file.
Cerebrium has a similar notion in that the main file that is called on our side is main.py
To start, I copy across all import statements and constant variables that have nothing to do with Replicate/Cog. In this case:
Replicate makes use of classes for their syntax which we shy away from - we run whatever python code you give us and make each function an endpoint. Therefore, when you see a reference to self. remove it throughout the code
There is a folder in the repo called “feature-extractor” which we need to have in our repository. We could git clone the repo, however, its quite small, so I would just copy the contents of the folder and put it in your cerebrium project ie:
The setup function on Replicate runs on each cold start (ie: each new instantiation of the app) and so we just define it as normal code that gets run at the top of our file. I put it right below my import statements above.
The code above downloads the model weights if they don’t exist and then instantiates the models. To persist files/data on Cerebrium, you need to store it on the path /persistent-storage. So we can update the following paths above:
We can then copy the two other functions, run_safety_checker() and predict(). In Cerebrium, the parameters of a function is the json data it expects if you make a request to it. We can then define it as follows:
The above returns a path to the generated images, but we would like to return it as a base64 encoded image so that users can render the image instantly. You are welcome to upload the images to a storage bucket to reference directly - its up to you.
Now we can run cerebrium deploy
. You should see your app build in under 90 seconds.
It should output the curl statement to run your app:
Make sure to replace the end of the URL with /predict (since that is the function we are calling) and send it the required JSON data. This is our result
You should be all ready to go!
You can read further about some of the functionality Cerebrium has to offer