Introduction
We are saddened by the news that Mystic AI is sunsetting their services. They were an early pioneer in the space and really pushed the industry forward. This guide will walk you through migrating apps from Mystic to Cerebrium to ensure that users’ apps remain functional despite the circumstances.
We’ll show you how to convert your existing Mystic code (using a stable diffusion example) and configuration to work with the Cerebrium platform. You’ll learn about the tools and features available to make this transition as seamless as possible, including ways to optimize your deployments for better performance and cost efficiency.
Key Differences
Cerebrium helps teams deploy and run their models efficiently. Our infrastructure is designed for reliable performance:
- The average model cold-starts in 2-5 seconds.
- Updates to your code deploy quickly, taking only 8-14 seconds.
- 99.9% uptime.
Cerebrium gives you precise control over your computing resources. Instead of managing entire instances, which can become costly and unnecessary, you choose exactly how much CPU, memory, and GPU power you need. You pay only for the resources you use, calculated down to the second. To better understand costs for your specific needs, you can use our pricing calculator.
Migration Process
1. Project Setup and Configuration
Start by installing Cerebrium’s command-line tool and creating your project:
pip install cerebrium --upgrade
cerebrium login # You'll be redirected to the dashboard for login
cerebrium init stable-diffusion
cd stable-diffusion
Next, we’ll convert your existing configuration to Cerebrium’s format. Here’s what your current Mystic configuration may look like:
# Mystic's pipeline.yaml
runtime:
container_commands:
- apt-get update
- apt-get install -y git
python:
version: "3.10"
requirements:
- pipeline-ai
- diffusers==0.24.0
- torch==2.1.1
- transformers==4.35.2
- accelerate==0.25.0
cuda_version: "11.4"
accelerators:
- "nvidia_a10"
accelerator_memory: null
pipeline_graph: sd_pipeline:pipeline_graph
pipeline_name: <YOUR_USERNAME>/stable-diffusion-v1.5
extras: {}
Transforms into Cerebrium’s easy-to-understand TOML config:
# cerebrium.toml
[cerebrium.deployment]
name = "stable-diffusion"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = ["./*", "main.py", "cerebrium.toml"]
exclude = [".*"]
[cerebrium.hardware]
compute = "AMPERE_A10" # Choose your GPU type
cpu = 4 # Number of CPU cores
memory = 16.0 # Memory in GB
gpu_count = 1 # Number of GPUs
[cerebrium.scaling]
min_replicas = 0 # Save costs when inactive and scale down your app
max_replicas = 2 # Handle increased traffic and scale up where necessary
cooldown = 60 # Time to wait before scaling down an idle instance
replica_concurrency = 1 # The number of requests a single container can support
[cerebrium.dependencies.pip]
torch = ">=2.0.0"
pydantic = "latest"
transformers = "latest"
accelerate = "latest"
diffusers = "latest"
safetensors = "latest"
xformers = "latest"
2. Code Migration
Now we’ll convert your model implementation. Here’s what a typical Mystic pipeline looks like:
import typing as t
from pathlib import Path
from PIL.Image import Image
from pipeline.cloud.pipelines import run_pipeline
from pipeline.objects.graph import InputField, InputSchema
from pipeline import File, Pipeline, Variable, entity, pipe
HF_MODEL_ID = "runwayml/stable-diffusion-v1-5"
class ModelKwargs(InputSchema):
num_images_per_prompt: int | None = InputField(
title="num_images_per_prompt",
description="The number of images to generate per prompt.",
default=1,
optional=True,
)
height: int | None = InputField(
title="height",
description="The height in pixels of the generated image.",
default=512,
optional=True,
multiple_of=64,
ge=64,
)
width: int | None = InputField(
title="width",
description="The width in pixels of the generated image.",
default=512,
optional=True,
multiple_of=64,
ge=64,
)
num_inference_steps: int | None = InputField(
title="num_inference_steps",
description=(
"The number of denoising steps. More denoising steps "
"usually lead to a higher quality image at the expense "
"of slower inference."
),
default=50,
optional=True,
)
@entity
class StableDiffusionModel:
def __init__(self) -> None:
self.model = None
self.device = None
@pipe(run_once=True, on_startup=True)
def load(self) -> None:
"""
Load the HF model into memory"""
import torch
from diffusers import StableDiffusionPipeline
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
self.model = StableDiffusionPipeline.from_pretrained(HF_MODEL_ID)
self.model.to(device)
@pipe
def predict(self, prompt: str, model_kwargs: ModelKwargs) -> t.List[Image]:
"""
Generates a list of PIL images.
"""
return self.model(prompt=prompt, **model_kwargs.to_dict()).images
@pipe
def postprocess(self, images: t.List[Image]) -> t.List[File]:
"""
Creates a list of Files from the `PIL` images.
"""
output_images = []
for i, image in enumerate(images):
path = Path(f"/tmp/sd/image-{i}.jpg")
path.parent.mkdir(parents=True, exist_ok=True)
image.save(str(path))
output_images.append(File(path=path, allow_out_of_context_creation=True))
return output_images
with Pipeline() as builder:
prompt = Variable(
str,
title="prompt",
description="The prompt to guide image generation",
max_length=512,
)
model_kwargs = Variable(ModelKwargs)
model = StableDiffusionModel()
model.load()
images: t.List[Image] = model.predict(prompt, model_kwargs)
output: t.List[File] = model.postprocess(images)
builder.output(output)
pipeline_graph = builder.get_pipeline()
Which becomes much simpler in Cerebrium. Add the following to you main.py
file:
import base64
import io
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from pydantic import BaseModel
# Define the structure of input parameters
class Item(BaseModel):
prompt: str
height: int
width: int
num_inference_steps: int
num_images_per_prompt: int
# Load the model and set it up for inference
model_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")
# The endpoint we'll call to make inference
def predict(
prompt: str,
height: int = 512,
width: int = 512,
num_inference_steps: int = 25,
num_images_per_prompt: int = 1,
):
item = Item(
prompt=prompt,
height=height,
width=width,
num_inference_steps=num_inference_steps,
num_images_per_prompt=num_images_per_prompt,
)
images = pipe(
prompt=item.prompt,
height=item.height,
width=item.width,
num_images_per_prompt=item.num_images_per_prompt,
num_inference_steps=item.num_inference_steps,
).images
finished_images = []
for image in images:
buffered = io.BytesIO()
image.save(buffered, format="PNG")
finished_images.append(base64.b64encode(buffered.getvalue()).decode("utf-8"))
return finished_images
3. Deployment
Deploy your model with a single command:
4. Inference
Once your app is deployed, you can make requests to your model using the example cURL request below:
curl --location 'https://api.cortex.cerebrium.ai/v4/p-<YOUR PROJECT ID>/stable-diffusion/predict' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR TOKEN HERE>' \
--data '{
"prompt": "a photo of an astronaut riding a horse on mars"
}'
Transitioning between platforms requires careful planning and execution. We’re here to help make this process as smooth as possible for teams like yours. Our platform provides the tools and support you need to ensure your apps continue running with unparalleled reliability.
Connect with other developers and our team in our active communities for better response and issue resolution times:
- Join our Discord server.
- Join our Slack workspace.
These communities are great places to share migration experiences, get quick answers to technical questions, learn best practices from other developers, and stay updated on new features and improvements.