Config Files
In order to simplify the training experience and make deployments easier, we use YAML config files. This lets you keep track of all your training parameters in one place, giving you the flexibility to train your model with the parameters you need while still keeping the deployment process streamlined.
In this section, we introduce the parameters you may want to manipulate for your training, however, we’ll set the defaults to what we’ve found works well if you’d like to leave them out!
If you would like to override the parameters in your config file during a deployment, the --config-string
option in the command line accepts a JSON string. The parameters provided in the JSON string will override the values they may have been assigned in your config file.
For your convenience, an example of the config file is available here
Setting up a config file
Your YAML config file can be placed anywhere on your system, just point the trainer to the file.
In your YAML file, you need to include the following required parameters or parse them into the cerebrium train
command:
Required parameters
The following parameters need to be provided either in your command-line call of the trainer or in your config file:
Parameter | Suggested Value | Description |
---|---|---|
training_type | diffuser | Type of training to run. Either diffuser or transformer. In this case, use diffuser. |
name | test-config-file | Name of the experiment. |
api_key | Your private API key | Your Cerebrium API key. Can also be retrieved if you have run cerebrium login |
hf_model_path | runwayml/stable-diffusion-v1-5 | Path to your stable diffusion model on huggingface. |
train_prompt | Your prompt to train. | |
log_level | INFO | log_level level for logging. Can be DEBUG, INFO, WARNING, ERROR. |
train_image_dir | data/training_class_images/ | Directory of training images. |
Optional parameters
Additionally, you can include the following parameters in your config file:
Parameter | Suggested Value | Description |
---|---|---|
Diffuser Parameters | ||
revision | main | Revision of the diffuser model to use. |
validation_prompt | ~ | an optional validation prompt to use. Will use the training prompt if ”~“. |
custom_tokenizer | ~ | custom tokenizer from AutoTokenizer if required. |
Dataset params | ||
prior_class_image_dir | data/prior_class_images | Optional directory of images to use for prior class. |
prior_class_prompt | Photo of a dog. |
Training parameters
The following training parameters can be included if you need to modify the training requirements. These must be child parameters which fall under the training_args parameter in your config file.
Parameter | Sub parameter | Suggested Value | Description |
---|---|---|---|
training_args | |||
num_validation_images | 4 | Number of images to generate at each validation step. | |
num_train_epochs | 50 | Number of epochs to train for | |
seed | ~ | manual seed to set for training. | |
resolution | 512 | Resolution to train images at. | |
center_crop | False | Whether to center crop images to resolution. | |
train_batch_size | 2 | Batch size for training. Will significantly affect memory usage so we suggest leaving at 2. | |
num_prior_class_images | 10 | Number of images to generate using the prior class prompt | |
prior_class_generation_batch_size | 2 | Batch size for generating prior class images. We suggest leaving at 2. | |
prior_generation_precision | ~ | Precision used to generate prior class images if applicable. | |
prior_loss_weight | 1.0 | Weight of prior loss in the total loss. | |
max_train_steps | ~ | maximum training steps which overrides number of training epochs | |
validation_epochs | 5 | number of epochs before running validation and checkpointing | |
gradient_accumulation_steps | 1 | Number of gradient accumulation steps. | |
learning_rate | 0.0005 | The learning rate you would like to train with. Can be more aggressive than you would use to train a full network. | |
lr_scheduler | constant | Learning rate schedule. Can be one of [“constant”, “linear”, “cosine”, “polynomial”] | |
lr_warmup_steps | 5 | Number of learning rate warmup steps. | |
lr_num_cycles | 1 | Number of learning rate cycles to use. | |
lr_power | 1.0 | Power factor if using a polynomial scheduler. | |
max_grad_norm | 1.0 | Maximum gradient norm. | |
mixed_precision | no | If you would like to use mixed precision. Supports ‘fp16’ and ‘bf16’ else defaults to ‘no’ which will train in fp32. Use with caution as it can cause the safety checker to misbehave. | |
scale_lr | False | Scale the learning rate according to the gradient accumulation steps, training batch size and number of processes. | |
allow_tf32 | False | Allow matmul using tf32. Defaults to False. | |
use_8bit_adam | True | Use 8 bit adam for lower memory usage. | |
use_xformers | True | Whether to use xformers memory efficient attention or not. |
Example yaml config file
%YAML 1.2
---
###############################################################
# Mandatory Parameters. Must be provided here or in the CLI.
###############################################################
training_type: "diffuser" # Type of training to run. Either "diffuser" or "transformer".
name: "test-config-file" # Name of the experiment.
api_key: "YOUR PRIVATE CEREBRIUM API KEY" # Your Cerebrium API key.
hf_model_path: "runwayml/stable-diffusion-v1-5" # Path to the huggingface diffusion model to train.
train_prompt: "INSERT YOUR TRAINING PROMPT" # Your prompt to train.
log_level: "INFO" # log_level level for logging. Can be "DEBUG", "INFO", "WARNING", "ERROR".
train_image_dir: data/training_class_images/ # Directory of training images.
###############################################################
# Optional Parameters
###############################################################
# Diffuser params
revision: "main" # Revision of the diffuser model to use.
validation_prompt: ~ # an optional validation prompt to use. If ~, will use the training prompt.
custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
# Dataset params
prior_class_image_dir: ~ # or "path/to/your/prior_class_images". Optional directory of images to use if you would like to train prior class images as well.
prior_class_prompt: ~ # Set your prior class prompt here. If ~, will not use prior classes. Only use prior class preservation if you want to preserve the class in your results.
# Training params
training_args:
# General training params
learning_rate: 1.0E-5
num_validation_images: 4 # Number of images to generate in validation.
num_train_epochs: 50
seed: 1
resolution: 512 # Resolution to train images at.
center_crop: False # Whether to center crop images to resolution.
train_batch_size: 2
num_prior_class_images: 5
prior_class_generation_batch_size: 2
prior_loss_weight: 1.0 # Weight of prior loss in the total loss.
max_train_steps: ~ # maximum training steps which overrides number of training epochs
validation_epochs: 5 # number of epochs before running validation and checkpointing
# Training loop params
gradient_accumulation_steps: 1
lr_scheduler: "constant"
lr_warmup_steps: 50
lr_num_cycles: 1
lr_power: 1.0
allow_tf32: False
max_grad_norm: 1.0
mixed_precision: "no" # If you would like to use mixed precision. Supports fp16 and bf16. Defaults to 'no'
prior_generation_precision: ~
scale_lr: False
use_8bit_adam: True
use_xformers: True # Whether to use xformers memory efficient attention or not.