In this tutorial, we will be using ControlNet Canny and SDXL, to alter the images of the HuggingFace logo to make it more appealing to the end user. SDXL is the Stable Diffusion model released by Stability AI for high-resolution image generation. ControlNet allows you to provide a image and to replace parts of the image keeping the original outlines of the image.

To see the final implementation, you can view it here

Basic Setup

It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy! Please make sure you have the Cerebrium package installed and have logged in. If not, please take a look at our docs here

First, we create our project:

cerebrium init controlnet-logo

It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy!

Let us add the following packages to the [cerebrium.dependencies.pip] section of our cerebrium.toml file:

[cerebrium.dependencies.pip]
accelerate = "latest"
transformers = ">=4.35.0"
safetensors = "latest"
opencv-python = "latest"
diffusers = "latest"

To start, we need to create a main.py file which will contain our main Python code. This is a relatively simple implementation, so we can do everything in 1 file. We would like a user to send in a link to a YouTube video with a question and return to them the answer as well as the time segment of where we got that response. So let us define our request object.

from typing import Optional
from pydantic import BaseModel
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from PIL import Image
import torch
import numpy as np
import cv2
import io
import base64

class Item(BaseModel):
    prompt: str
    image: str
    negative_prompt: Optional[str] = None
    conditioning_scale: Optional[float] = 0.5
    height: Optional[int] = 512
    width: Optional[int] = 512
    num_inference_steps: Optional[int] = 20
    guidance_scale: Optional[float] = 7.5
    num_images_per_prompt: Optional[int] = 1

Above, we import all the various Python libraries we require as well as use Pydantic as our data validation library. Due to the way that we have defined the Base Model, “prompt” and “image” are required parameters and so if they are not present in the request, the user will automatically receive an error message. Everything else is optional.

Instantiate model

Below, we load in our ControlNet and SDXL models. This will be downloaded during your deployment, however, in subsequent deploys or inference requests it will be automatically cached in your persistent storage for subsequent use. You can read more about persistent storage here We do this outside our predict function since we only want this code to run on a cold start (ie: on startup). If the container is already warm, we just want it to do inference and it will execute just the predict function.

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
pipe = pipe.to("cuda")

Predict Function

Below we simply get the parameters from our request and pass it to the ControlNet model to generate the image(s). You will notice we convert the images to base64, this is so we can return it directly instead of writing the files to an S3 bucket - the return of the predict function needs to be JSON serializable.

def predict(item, run_id, logger):
    item = Item(**item)

    init_image = load_image(item.image_url)
    image = np.array(init_image)
    image = cv2.Canny(image, 100, 200)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    image = Image.fromarray(image)

    images = pipe(
        item.prompt,
        negative_prompt=item.negative_prompt,
        image=image,
        controlnet_conditioning_scale=item.conditioning_scale,
        height=item.height,
        width=item.width,
        num_inference_steps=item.num_inference_steps,
        guidance_scale=item.guidance_scale,
        num_images_per_prompt=item.num_images_per_prompt
    ).images

    finished_images = []
    for image in images:
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        finished_images.append(base64.b64encode(buffered.getvalue()).decode("utf-8"))

    return {"images": finished_images}

Deploy

Your cerebrium.toml file is where you can set your compute/environment. Please make sure that the GPU you specify is a AMPERE_A5000 and that you have enough memory (RAM) on your instance to run the models. You cerebrium.toml file should look like:


[cerebrium.build]
predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}"
force_rebuild = false
disable_animation = false
log_level = "INFO"
disable_deployment_confirmation = false

[cerebrium.deployment]
name = "controlnet-logo"
python_version = "3.10"
include = "[./*, main.py]"
exclude = "[./.*, ./__*]"

[cerebrium.hardware]
gpu = "AMPERE_A5000"
cpu = 2
memory = 16.0
gpu_count = 1

[cerebrium.scaling]
min_replicas = 0
cooldown = 60

[cerebrium.dependencies.apt]
ffmpeg = "latest"

[cerebrium.dependencies.pip]
accelerate = "latest"
transformers = ">=4.35.0"
safetensors = "latest"
opencv-python = "latest"
diffusers = "latest"

[cerebrium.dependencies.conda]

To deploy the model, use the following command:

cerebrium deploy controlnet-logo

Once deployed, we can make the following request:

curl --location --request POST 'https://run.cerebrium.ai/v3/p-xxxxxx/controlnet-logo/predict' \
--header 'Authorization: public-XXXXXXXXXXXX' \
--header 'Content-Type: application/json' \
--data-raw '{
    "prompt":"aerial view, a futuristic research complex in a bright foggy jungle, hard lighting",
    "negative_prompt": "low quality, bad quality, sketches",
    "image_url": "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
}'

We then get the following results:

{
    "run_id": "xMc1UBBmjZOBCn5iR58idyy4pX59kk34og2THcoxzmQp9HSLLbknhw==",
    "message": "Finished inference request with run_id: `xMc1UBBmjZOBCn5iR58idyy4pX59kk34og2THcoxzmQp9HSLLbknhw==`",
    "result": {
        "images": [
            <BASE64_ENCODED_IMAGE>
        ]
    },
    "status_code": 200,
    "run_time_ms": 9617.486715316772
}

Our image then looks like this:

ControlNet Logo