Introduction

We are saddened by the news that Mystic AI is sunsetting their services. They were a early pioneer in the space and really pushed the industry forward. To support their users and ensure that their apps still remain functional despite the circumstances, this guide will walk you through migrating apps from Mystic to Cerebrium.

We’ll show you how to convert your existing Mystic code (Using a stable diffusion example) and configuration to work with the Cerebrium platform. You’ll learn about the tools and features available to make this transition as seamless as possible, including ways to optimize your deployments for better performance and cost efficiency.

Key Differences

Cerebrium helps teams deploy and run their models efficiently. Our infrastructure is designed for reliable performance:

  • The average model cold starts in 2-5 seconds
  • Updates to your code deploy quickly, taking only 8-14 seconds
  • 99.9 % uptime

Cerebrium gives you precise control over your computing resources. Instead of managing entire instances, which can become costly and unnecessary, you choose exactly how much CPU, memory, and GPU power you need. You pay only for the resources you use, calculated down to the second. To better understand costs for your specific needs, you can use our pricing calculator.

Migration Process

1. Project Setup and Configuration

Start by installing Cerebrium’s command-line tool and creating your project:

pip install cerebrium --upgrade
cerebrium login # You'll be redirected to the dashboard for login
cerebrium init stable-diffusion
cd stable-diffusion

Next, we’ll convert your existing configuration to Cerebrium’s format. Here’s what your current Mystic configuration may look like:

# Mystic's pipeline.yaml
runtime:
  container_commands:
    - apt-get update
    - apt-get install -y git
  python:
    version: "3.10"
    requirements:
      - pipeline-ai
      - diffusers==0.24.0
      - torch==2.1.1
      - transformers==4.35.2
      - accelerate==0.25.0
    cuda_version: "11.4"
accelerators:
  - "nvidia_a10"
accelerator_memory: null
pipeline_graph: sd_pipeline:pipeline_graph
pipeline_name: <YOUR_USERNAME>/stable-diffusion-v1.5
extras: {}

Transforms into Cerebrium’s easy-to-understand TOML config:

# cerebrium.toml
[cerebrium.deployment]
name = "stable-diffusion"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = ["./*", "main.py", "cerebrium.toml"]
exclude = [".*"]

[cerebrium.hardware]
compute = "AMPERE_A10"    # Choose your GPU type
cpu = 4                   # Number of CPU cores
memory = 16.0             # Memory in GB
gpu_count = 1             # Number of GPUs

[cerebrium.scaling]
min_replicas = 0         # Save costs when inactive and scale down your app
max_replicas = 2         # Handle increased traffic and scale up where necessary
cooldown = 60            # Time to wait before scaling down an idle instance
replica_concurrency = 1  # The number of requests a single container can support

[cerebrium.dependencies.pip]
torch = ">=2.0.0"
pydantic = "latest"
transformers = "latest"
accelerate = "latest"
diffusers = "latest"
safetensors = "latest"
xformers = "latest"

2. Code Migration

Now we’ll convert your model implementation. Here’s what a typical Mystic pipeline looks like:

import typing as t
from pathlib import Path

from PIL.Image import Image
from pipeline.cloud.pipelines import run_pipeline
from pipeline.objects.graph import InputField, InputSchema

from pipeline import File, Pipeline, Variable, entity, pipe

HF_MODEL_ID = "runwayml/stable-diffusion-v1-5"

class ModelKwargs(InputSchema):
    num_images_per_prompt: int | None = InputField(
        title="num_images_per_prompt",
        description="The number of images to generate per prompt.",
        default=1,
        optional=True,
    )
    height: int | None = InputField(
        title="height",
        description="The height in pixels of the generated image.",
        default=512,
        optional=True,
        multiple_of=64,
        ge=64,
    )
    width: int | None = InputField(
        title="width",
        description="The width in pixels of the generated image.",
        default=512,
        optional=True,
        multiple_of=64,
        ge=64,
    )
    num_inference_steps: int | None = InputField(
        title="num_inference_steps",
        description=(
            "The number of denoising steps. More denoising steps "
            "usually lead to a higher quality image at the expense "
            "of slower inference."
        ),
        default=50,
        optional=True,
    )

@entity
class StableDiffusionModel:
    def __init__(self) -> None:
        self.model = None
        self.device = None

    @pipe(run_once=True, on_startup=True)
    def load(self) -> None:
        """
        Load the HF model into memory"""
        import torch
        from diffusers import StableDiffusionPipeline

        device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
        self.model = StableDiffusionPipeline.from_pretrained(HF_MODEL_ID)
        self.model.to(device)

    @pipe
    def predict(self, prompt: str, model_kwargs: ModelKwargs) -> t.List[Image]:
        """
        Generates a list of PIL images.
        """
        return self.model(prompt=prompt, **model_kwargs.to_dict()).images

    @pipe
    def postprocess(self, images: t.List[Image]) -> t.List[File]:
        """
        Creates a list of Files from the `PIL` images.
        """
        output_images = []
        for i, image in enumerate(images):
            path = Path(f"/tmp/sd/image-{i}.jpg")
            path.parent.mkdir(parents=True, exist_ok=True)
            image.save(str(path))
            output_images.append(File(path=path, allow_out_of_context_creation=True))
        return output_images

with Pipeline() as builder:
    prompt = Variable(
        str,
        title="prompt",
        description="The prompt to guide image generation",
        max_length=512,
    )
    model_kwargs = Variable(ModelKwargs)

    model = StableDiffusionModel()
    model.load()

    images: t.List[Image] = model.predict(prompt, model_kwargs)

    output: t.List[File] = model.postprocess(images)

    builder.output(output)

pipeline_graph = builder.get_pipeline()

Which becomes much simpler in Cerebrium. Add the following to you main.py file:

import base64
import io

import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from pydantic import BaseModel


# Define the structure of input parameters
class Item(BaseModel):
    prompt: str
    height: int
    width: int
    num_inference_steps: int
    num_images_per_prompt: int


# Load the model and set it up for inference
model_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")


# The endpoint we'll call to make inference
def predict(
        prompt: str,
        height: int = 512,
        width: int = 512,
        num_inference_steps: int = 25,
        num_images_per_prompt: int = 1,
):
    item = Item(
        prompt=prompt,
        height=height,
        width=width,
        num_inference_steps=num_inference_steps,
        num_images_per_prompt=num_images_per_prompt,
    )
    images = pipe(
        prompt=item.prompt,
        height=item.height,
        width=item.width,
        num_images_per_prompt=item.num_images_per_prompt,
        num_inference_steps=item.num_inference_steps,
    ).images
    finished_images = []
    for image in images:
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        finished_images.append(base64.b64encode(buffered.getvalue()).decode("utf-8"))
    return finished_images

3. Deployment

Deploy your model with a single command:

cerebrium deploy

4. Inference

Once your app is deployed, you’re now able to successfully make requests to your model using a the below example cURL request:

curl --location 'https://api.cortex.cerebrium.ai/v4/p-<YOUR PROJECT ID>/stable-diffusion/predict' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR TOKEN HERE>' \
--data '{
    "prompt": "a photo of an astronaut riding a horse on mars"
}''

Transitioning between platforms requires careful planning and execution. We’re here to help make this process as smooth as possible for teams like yours. Our platform provides the tools and support you need to ensure your apps continue running with unparalleled reliability.

Join Our Community

Connect with other developers and our team in our active communities for better response and issue resolution times:

  • Join our Discord server
  • Join our Slack workspace

These communities are great places to share migration experiences, get quick answers to technical questions, learn best practices from other developers, and stay updated on new features and improvements.