Introduction
In this tutorial, I will show you how you can migrate your workloads from Replicate to Cerebrium in less than 5 minutes. As an example, we will be migrating the model SDXL-Lightning-4step from ByteDance. You can find the link to it on Replicate here. It is best to look at the code in the GitHub repo and follow along as we migrate it. To start, let us create our Cerebrium project.- Since we need a GPU, we need to use one of the base images that come from Nvidia that has the CUDA libraries installed. We use the Cuda 12 image. You can see other images here.
- Depending on the type of CPU/GPU you need, you can update the hardware settings to run your app. You can see the full list available here.
- We copy across the pip packages we need to install
- Replicate uses pget to download model weights - therefore, we need to download it to use it. We do this by installing curl and then adding the shell commands in our cerebrium.toml
predict.py
- so let us inspect that file.
Cerebrium has a similar notion in that the main file that is called on our side is main.py
.
To start, I copy across all import statements and constant variables that have nothing to do with Replicate/Cog. In this case:
self.
, remove it throughout the code.
There is a folder in the repo called “feature-extractor” which we need to have in our repository. We could git clone the repo; however, it’s quite small, so I would just copy the contents of the folder and put it in your Cerebrium project, i.e.:

cerebrium deploy
. You should see your app build in under 90 seconds.
It should output the curl statement to run your app:

/predict
(since that is the function we are calling) and send it the required JSON data. This is our result: