Cerebrium’s fine-tuning functionality is in public beta and we are adding more functionality each week! If there are any issues or if you have an urgent requirement, please reach out to support

In order to simplify the training experience and make deployments easier, we use YAML config files. This lets you keep track of all your training parameters in one place, giving you the flexibility to train your model with the parameters you need while still keeping the deployment process streamlined.

In this section, we introduce the parameters you may want to manipulate for your training, however, we’ll set the defaults to what we’ve found works well if you’d like to leave them out!

If you would like to override the parameters in your config file during a deployment, the --config-string option in the command line accepts a JSON string. The parameters provided in the JSON string will override the values they may have been assigned in your config file.

For your convenience, an example of the config file is available here

Setting up a config file

Your Yaml config file can be placed anywhere on your system, just point the trainer to the file.

In your YAML file, you need to include the following required parameters or parse them into the cerebrium train command:

Required parameters

ParameterSuggested ValueDescription
training_typetransformerType of training to run. Either diffuser or transformer. In this case, transformer.
nameYour name for the fine-tuning run.
api_keyYour Cerebrium private API key
hf_model_pathdecapoda-research/llama-7b-hf
model_typeAutoModelForCausalLM
dataset_pathpath/to/your/dataset.jsonpath to your local JSON dataset.
log_levelINFOlog_level level for logging.

Optional parameters

ParameterSuggested ValueDescription
custom_tokenizer~custom tokenizer from AutoTokenizer if required.
seed42random seed for reproducibility.
ParameterSub parameterSuggested ValueDescription
training_args
logging_steps10Number of steps between logging.
per_device_train_batch_size15Batch size per GPU for training.
per_device_eval_batch_size15Batch size per GPU for evaluation.
warmup_steps0Number of warmup steps for learning rate scheduler.
gradient_accumulation_steps4Number of gradient accumulation steps.
num_train_epochs50Number of training epochs.
learning_rate1.0e-4Learning rate for training.
group_by_lengthFalseWhether to group batches by length.
base_model_argsThe kwargs for loading in the base model with AutoModelForCausalLM()
load_in_8bitTrueWhether to load in the model in 8bit.
device_map“auto”Device map for loading in the model.
peft_lora_argsPeft lora kwargs for use by PeftConfig()
r8The r value for LoRA.
lora_alpha32The lora_alpha value for LoRA.
lora_dropout0.05The lora_dropout value for LoRA.
target_modules[“q_proj”, “v_proj”]The target_modules for LoRA. These are the suggested values for Llama
bias“none”Bias for LoRA.
task_type“CAUSAL_LM”The task_type for LoRA.
dataset_argsCustom args for your dataset.
prompt_template“short”Prompt template to use. Either “short” or “long”.
instruction_column“prompt”Column name of your prompt in the dataset.json
label_column“completion”Column name of your label/completion in the dataset.json
context_column“context”Optional column name of your context in the dataset.json
cutoff_len512Cutoff length for the prompt.
train_val_ratio0.9Ratio of training to validation data in the dataset split.

Example yaml config file

%YAML 1.2
---
training_type: "transformer" # Type of training to run. Either "diffuser" or "transformer". In this case, "transformer".

name: test-config-file # Your name for the fine-tuning run.
api_key: <<<Your Cerebrium private API key>>>
auth_token: YOUR HUGGINGFACE API TOKEN # Optional. You will need this if you are finetuning llama2
# Model params:
hf_model_path: "decapoda-research/llama-7b-hf"
model_type: "AutoModelForCausalLM"
dataset_path: path/to/your/dataset.json # path to your local JSON dataset.
custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
seed: 42 # random seed for reproducibility.
log_level: "INFO" # log_level level for logging.

# Training params:
training_args:
  logging_steps: 10
  per_device_train_batch_size: 15
  per_device_eval_batch_size: 15
  warmup_steps: 0
  gradient_accumulation_steps: 4
  num_train_epochs: 50
  learning_rate: 1.0E-4
  group_by_length: False

base_model_args: # args for loading in the base model with AutoModelForCausalLM
  load_in_8bit: True
  device_map: "auto"

peft_lora_args: # peft lora args.
  r: 32
  lora_alpha: 16
  lora_dropout: 0.05
  target_modules: ["q_proj", "v_proj"]
  bias: "none"
  task_type: "CAUSAL_LM"

dataset_args:
  prompt_template: "short" # Prompt template to use. Either "short" or "long".
  instruction_column: "prompt" # column name of your prompt in the dataset.json
  label_column: "completion" # column name of your label/completion in the dataset.json
  context_column: "context" # optional column name of your context in the dataset.json
  cutoff_len: 512 # cutoff length for the prompt.
  train_val_ratio: 0.9 # ratio of training to validation data in the dataset split.