In this tutorial, we will demonstrate how to get Stable Diffusion 2.1 up and running in 5 minutes using custom Python. You could use the pre-built version of this model from your Cerebrium dashboard - we just wanted to show a different way of doing this.

To see the final implementation, you can view it here

Create Cerebrium Account

Before building, you need to set up a Cerebrium account. This is as simple as starting a new Project in Cerebrium and copying the API key. This will be used to authenticate all calls for this project.

Create a project

  1. Go to
  2. Sign up or Login
  3. Navigate to the API Keys page
  4. You will need your private API key for deployments. Click the copy button to copy it to your clipboard


Basic Setup

It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab.

To start, we need to create a file which will contain our main Python code. This is a relatively simple implementation, so we can do everything in 1 file.

I then go straight to the Hugging face documentation and see that to run Stable Diffusion 2.1 I need to install a few dependencies. To do this, I need to create a requirements.txt file with the following dependencies:


I then define the following in my file

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

model_id = "stabilityai/stable-diffusion-2-1"

# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe ="cuda")

We define this at the top of the file for two reasons:

  1. Code loaded in the top of the file is automatically run when deployed, meaning the model can be downloaded and loaded into memory on the first deployment. The first deployment is the longest, but thereafter it should be pretty quick.
  2. If this was in a function, it would not be called on deploy but on the first request. Also, it would be loaded from cache on every request. We do not want either of these two things.

I then would like the user to send in a few parameters for the model via API, and so I need to define what the request object would look like:

from pydantic import BaseModel

class Item(BaseModel):
    prompt: str
    height: Optional[int]
    width: Optional[int]
    num_inference_steps: Optional[int]
    num_images_per_prompt: Optional[int]

Pydantic is a data validation library and BaseModel is where Cerebrium keeps some default parameters like “webhook_url” that allows a user to send in a webhook url, and we will call it when the job has finished processing — this is useful for long-running tasks. Do not worry about that functionality for this tutorial.

We would then like to create a function, predict that we would like to be called every time our user submits a request. The predict function received 3 parameters:

  1. The request object you defined
  2. The run_id which is a unique identifier for each inference request
  3. A logger object which you should use to log any contents. You should use…) or logger.error(…)

The request will then enter the user prompt into the model and from there return the image(s) to the user in base64 format. Our code looks as follows:

import io
import base64

def predict(item, run_id, logger):

    images = []
    images = pipe(
        height=getattr(item, "height", 512),
        width=getattr(item, "width", 512),
        num_inference_steps= getattr(item,"num_inference_steps", 25)

    finished_images = []
    for image in images:
        buffered = io.BytesIO(), format="PNG")

    return finished_images

In the above code we do a few things:

  1. We run our Stable Diffusion model with parameters the user sent through the request. If the user didn’t send any then we set default values.
  2. We convert the generated images to base64 to return it back to the user

To deploy this function to an AMPERE_A5000 instance, we run the following code:

cerebrium deploy stable-diffusion --hardware AMPERE\_A5000 --api-key private-XXXXXXXXXXXXX

We can then make the following request:

curl --location --request POST '' \
--header 'Authorization: public-XXXXXXXXXXXX' \
--header 'Content-Type: application/json' \
--data-raw '{
    "prompt": "A knight riding a horse on the moon"

This gives us the following results:

  "run_id": "65da615b-829a-49d8-abf6-64f2c887e546",
  "message": "success",
  "result": [
  "run_time_ms": 2803.5125732421875