Training Diffusion Models
Cerebrium’s fine-tuning functionality is in public beta and we are adding more functionality each week! If there are any issues or if you have an urgent requirement, please reach out to support
This guide will walk you through the process of fine-tuning a diffusion model on Cerebrium.
While this guide is a high-level overview of the process, you can find more detailed information on the available parameters in the config and dataset sections.
Creating your project
You can quickly set up a Cerebrium fine-tuning project by running the following command:
cerebrium init-trainer <<TRAINING_TYPE>> <<A_CONFIG_PATH>>
The above variables are:
- the type of fine-tuning (transformer or diffuser)
- a path of where to create your config file.
This will set up a YAMl config file with a sensible set of default parameters to help you get started quickly. Whether you are deploying a diffuser or transformer type of model, the default parameters are tailored to the type of model you are deploying so you can get good results immediately. All you need to do is fill in the model name and dataset parameters and you’re good to go!
Starting a job with the CLI
Starting a job on Cerebrium requires four things:
- A name for you to use to identify the training job.
- Your API key.
- A config file or JSON string. See this section for more info.
- Your local dataset of training images along with the corresponding prompt.
Once you have these, you can start a fine-tuning job using the following command:
cerebrium train --config-file <<Path to your config file>>
Your config-file
or config-string
could alternatively provide all the other parameters.
If you would like to provide the name
, training-type
or api-key
from the command line, you can add them as follows:
cerebrium train --config-file <<Path to your config file>> --name <<Name for your training>> --training-type "diffuser" --api-key <<Your api key if you haven't logged in>>
Note that if these parameters are present in your config-file
or config-string
, they will be overridden by the command line args.
Monitoring
Once your training job has been deployed, you will receive a job-id**. This can be used to access the job status as well as retrieve the logs.
Checking the training status:
cerebrium get-training-jobs --api-key <<API_KEY_IF_NOT_LOGGED_IN>> --last-n <<AN_OPTIONAL_PARAM_TO_LIMIT_NUMBER_OF_JOBS_RETURNED>>
Retrieving the training logs can be done with:
cerebrium get-training-logs <<YOUR_TRAINING_JOB_ID>> --api-key <<API_KEY_IF_NOT_LOGGED_IN>>
Note that your training logs will only be available once the job has started running and will not be stored after it is complete.
Coming soon: logging your training with weights and biases is in the final stages of development.
Retrieving your training results
Once your training is complete, you can download the training results using:
cerebrium download-model <<JOB_ID>> --api-key <<API_KEY_IF_NOT_LOGGED_IN>> --download-path <<OPTIONAL_PATH_TO_DOWNLOAD_TO>>
This will return a zip file which contains your diffusion model’s unet attention processors and the validation images generated by your model during training.
Using your Fine-tuned Diffuser
Using your finetuning results is done as follows:
from diffusers import (
DiffusionPipeline,
DPMSolverMultistepScheduler,
)
import torch
# Boilerplate loading of model
pipeline = DiffusionPipeline.from_pretrained(
your_model_name, revision=your_model_revision, torch_dtype=torch.float16
)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline = pipeline.to("cuda")
# LOAD IN YOUR TRAINING RESULTS
# load attention processors from where they are saved in your_results/checkpoints/final/attn_procs/pytorch_lora_weights.bin
pipeline.unet.load_attn_procs(os.path.join(final_output_dir, "attn_procs"))
# And that's all you need to do to load in the finetuned result!
# Now you can run your inference as you would like with the pipeline.
# some inference variables
your_prompt = "Your training prompt that you would like to use here"
num_images = 4 # number of images to generate
your_manual_seed = 42 # a manual seed if you would like repeatable results
# run inference as you normally would
generator = torch.Generator(device="cuda").manual_seed(your_manual_seed)
images = [
pipeline(your_prompt, num_inference_steps=25, generator=generator).images[0]
for _ in range(num_images)
]