In order to simplify the training experience and make deployments easier, we use YAML config files. This lets you keep track of all your training parameters in one place, giving you the flexibility to train your model with the parameters you need while still keeping the deployment process streamlined.

In this section, we introduce the parameters you may want to manipulate for your training, however, we’ll set the defaults to what we’ve found works well if you’d like to leave them out!

If you would like to override the parameters in your config file during a deployment, the --config-string option in the command line accepts a JSON string. The parameters provided in the JSON string will override the values they may have been assigned in your config file.

For your convenience, an example of the config file is available here

Setting up a config file

Your YAML config file can be placed anywhere on your system, just point the trainer to the file.

In your YAML file, you need to include the following required parameters or parse them into the cerebrium train command:

Required parameters

The following parameters need to be provided either in your command-line call of the trainer or in your config file:

ParameterSuggested ValueDescription
training_typediffuserType of training to run. Either diffuser or transformer. In this case, use diffuser.
nametest-config-fileName of the experiment.
api_keyYour private API keyYour Cerebrium API key. Can also be retrieved if you have run cerebrium login
hf_model_pathrunwayml/stable-diffusion-v1-5Path to your stable diffusion model on huggingface.
train_promptYour prompt to train.
log_levelINFOlog_level level for logging. Can be DEBUG, INFO, WARNING, ERROR.
train_image_dirdata/training_class_images/Directory of training images.

Optional parameters

Additionally, you can include the following parameters in your config file:

ParameterSuggested ValueDescription
Diffuser Parameters
revisionmainRevision of the diffuser model to use.
validation_prompt~an optional validation prompt to use. Will use the training prompt if ”~“.
custom_tokenizer~custom tokenizer from AutoTokenizer if required.
Dataset params
prior_class_image_dirdata/prior_class_imagesOptional directory of images to use for prior class.
prior_class_promptPhoto of a dog.
Training parameters

The following training parameters can be included if you need to modify the training requirements. These must be child parameters which fall under the training_args parameter in your config file.

ParameterSub parameterSuggested ValueDescription
training_args
num_validation_images4Number of images to generate at each validation step.
num_train_epochs50Number of epochs to train for
seed~manual seed to set for training.
resolution512Resolution to train images at.
center_cropFalseWhether to center crop images to resolution.
train_batch_size2Batch size for training. Will significantly affect memory usage so we suggest leaving at 2.
num_prior_class_images10Number of images to generate using the prior class prompt
prior_class_generation_batch_size2Batch size for generating prior class images. We suggest leaving at 2.
prior_generation_precision~Precision used to generate prior class images if applicable.
prior_loss_weight1.0Weight of prior loss in the total loss.
max_train_steps~maximum training steps which overrides number of training epochs
validation_epochs5number of epochs before running validation and checkpointing
gradient_accumulation_steps1Number of gradient accumulation steps.
learning_rate0.0005The learning rate you would like to train with. Can be more aggressive than you would use to train a full network.
lr_schedulerconstantLearning rate schedule. Can be one of [“constant”, “linear”, “cosine”, “polynomial”]
lr_warmup_steps5Number of learning rate warmup steps.
lr_num_cycles1Number of learning rate cycles to use.
lr_power1.0Power factor if using a polynomial scheduler.
max_grad_norm1.0Maximum gradient norm.
mixed_precisionnoIf you would like to use mixed precision. Supports ‘fp16’ and ‘bf16’ else defaults to ‘no’ which will train in fp32. Use with caution as it can cause the safety checker to misbehave.
scale_lrFalseScale the learning rate according to the gradient accumulation steps, training batch size and number of processes.
allow_tf32FalseAllow matmul using tf32. Defaults to False.
use_8bit_adamTrueUse 8 bit adam for lower memory usage.
use_xformersTrueWhether to use xformers memory efficient attention or not.

Example yaml config file

%YAML 1.2
---
###############################################################
#  Mandatory Parameters. Must be provided here or in the CLI.
###############################################################
training_type: "diffuser" # Type of training to run. Either "diffuser" or "transformer".
name: "test-config-file" # Name of the experiment.
api_key: "YOUR PRIVATE CEREBRIUM API KEY" # Your Cerebrium API key.
hf_model_path: "runwayml/stable-diffusion-v1-5" # Path to the huggingface diffusion model to train.
train_prompt: "INSERT YOUR TRAINING PROMPT" # Your prompt to train.
log_level: "INFO" # log_level level for logging. Can be "DEBUG", "INFO", "WARNING", "ERROR".
train_image_dir: data/training_class_images/ # Directory of training images.

###############################################################
#  Optional Parameters
###############################################################
# Diffuser params
revision: "main" # Revision of the diffuser model to use.
validation_prompt: ~ # an optional validation prompt to use. If ~, will use the training prompt.
custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.

# Dataset params
prior_class_image_dir: ~ # or "path/to/your/prior_class_images". Optional directory of images to use if you would like to train prior class images as well.
prior_class_prompt: ~ # Set your prior class prompt here. If ~, will not use prior classes. Only use prior class preservation if you want to preserve the class in your results.

# Training params
training_args:
  # General training params
  learning_rate: 1.0E-5
  num_validation_images: 4 # Number of images to generate in validation.
  num_train_epochs: 50
  seed: 1
  resolution: 512 # Resolution to train images at.
  center_crop: False # Whether to center crop images to resolution.
  train_batch_size: 2
  num_prior_class_images: 5
  prior_class_generation_batch_size: 2
  prior_loss_weight: 1.0 # Weight of prior loss in the total loss.
  max_train_steps: ~ # maximum training steps which overrides number of training epochs
  validation_epochs: 5 # number of epochs before running validation and checkpointing

  # Training loop params
  gradient_accumulation_steps: 1
  lr_scheduler: "constant"
  lr_warmup_steps: 50
  lr_num_cycles: 1
  lr_power: 1.0
  allow_tf32: False
  max_grad_norm: 1.0
  mixed_precision: "no" # If you would like to use mixed precision. Supports fp16 and bf16. Defaults to 'no'
  prior_generation_precision: ~
  scale_lr: False
  use_8bit_adam: True
  use_xformers: True # Whether to use xformers memory efficient attention or not.