Hyperparameter Sweep training Llama 3.2 with WandB
Run a hyperparameter sweep on Llama 3.2 with WandB
When training machine learning models, finding the perfect combination of hyper parameters can feel overwhelming however if you do it well, it can turn a good model into a great one! Hyper parameter sweeps help you find the best performing model for the least amount of compute or time spent training - think of them as your systematic approach to testing every variation to uncover the best result.
In this tutorial, we’ll walk through training Llama 3.2, using Wandb (Weights and Biases) to run hyper parameter sweeps to optimize its performance and we’ll leverage Cerebrium to scale our experiments across serverless GPUs, allowing us to find the best-performing model faster than ever.
If you would like to see the final version of this tutorial, you can view it on Github here.
Read this section if you’re unfamiliar with sweeps.
Analogy: Pizza Topping Sweep
Forget about ML for a second. Imagine you’re making pizzas, and you want to discover the most delicious combination of toppings. You can change three things about your pizza:
• Type of Cheese (mozzarella, cheddar, parmesan)
• Type of Sauce (tomato, pesto)
• Extra Topping (pepperoni, mushrooms, olives)
There are 12 possible combinations of pizzas you can make. One of them will taste the best!
To find out which pizza is the tastiest, you need to try all the combinations and rate them. This process is called a hyperparameter sweep. Your three hyperparameters are the cheese, sauce, and extra topping.
If you do it one pizza at a time, it could take hours. But if you had 12 ovens, you could bake all the pizzas at once and find the best one in just a few minutes!
If a kitchen is a GPU, then you need 12 GPUs to run each experiment to see which cookie is the best. The power of Cerebrium is the ability to run sweeps like this on 12 different GPUs (or 1,000 GPUs if you’d like) to get you the best version of a model fast.
Setup Cerebrium
If you don’t have a Cerebrium account, you can run the following in your cli:
This creates a folder with two files:
- main.py - Our entrypoint file where our code lives.
- cerebrium.toml - A configuration file that contains all our build and environment settings Add the following pip packages near the bottom of your cerebrium.toml. This will be used in creating our deployment environment.
We will return back to these files later but for now, we will continue the rest of this tutorial in this folder.
Setup Wandb
Weights & Biases (Wandb) is a powerful tool for tracking, visualizing, and managing machine learning experiments in real-time. It helps you log hyperparameters, metrics, and results, making it easy to compare models, optimize performance, and collaborate effectively with your team.
- Sign up for a free account and then log in to your wandb account by running the following in your CLI.
You should see a link printed in your terminal - click it and copy the API from the webpage back into your terminal.
While we have our Wandb API key, we should add it to our Cerebrium secrets in order to use in our code later. You can go to your Cerebrium Dashboard and navigate to the “secrets” tab in the left side bar. Add the following:
- Key: WANDB_API_KEY
- Value: The value you copied from the Wandb website.
Click the “Save All Changes” button to save the changes!
You should then be authenticated with Wandb and ready to go!
Training Script
Since we will be training the LLama 3.2 model from HuggingFace we need to make sure we have permission to access the model. You can navigate to the Model page on HuggingFace here. Make sure to accept all permissions.
You should then generate a HuggingFace token - click your circular image top right and click “Access token” at the bottom of the drop down. Create a new token if you need to and add it to your Cerebrium Secrets like we did above:
- Key: HF_TOKEN
- Value: Your Cerebrium Token
Click the “Save All Changes” button to save the secret!
For our training script, we will adapt the notebook from Kaggle here.
In your IDE in the current folder, create a requirements.txt file and add the following contents:
These are the packages we require both locally and on Cerebrium in order to create our environment. We set our environment in our cerebrium.toml so lets add this path to it. Let us also set our hardware requirements ie: compute that we need for our training task. Lastly, we set a max timeout for our task using response_grace_period which we set to 1 hour.
In your cerebrium.toml add the following:
Now run pip install -r requirements.txt
in your CLI so we can install these package locally.
Now in your main.py, put the following code:
You can read a deeper explanation of the training script here but here’s a high-level explanation of the code in bullet points:
- This code sets up a fine-tuning pipeline for a Large Language Model (specifically Llama 3.2) using several modern training techniques:
- The function takes a dictionary of parameters for flexibility in training configurations - this is our hyper parameter sweep.
- We load a customer support dataset from Hugging Face and format the data into a chat template format
- We implement QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning.
- We use Weights & Biases (wandb) for experiment tracking logging results to our Wandb dashboard as they are available.
- At the end, we saves the final model to our Cerebrium volume and return a “success” message to show that the training was successful.
This is what we will deploy on Cerebrium and essentially the train_model() function will be the endpoint that will kick off our training job with the parameters we pass in.
You can now run the following in your CLI:
This will setup our environment by installing our Python packages and will deploy our training script to be a endpoint that should be returned at the end of the deployment process. Copy this POST url - we will need it later.
Whats nice about Cerebrium is we did this with no decorators or special syntax, just simply wrapped training code in a function that is an endpoint autoscales based on the number of requests you make to it - perfect for hyper parameter sweeps!
Hyperparameter Sweep
Let us create a run.py file that we will use to run locally. Put the following code in there:
This code implements a hyperparameter sweep system using Weights & Biases (wandb) sweeps to train a Llama 3.2 model for customer support. Here’s what it does:
- Create a .env file and add your Inference API key from your Cerebrium Dashboard.
- Update the Cerebrium endpoint based on your project ID and the function name your wish to call. You will see we append this url with “?async=true”. This means its a fire-and-forgot request that can run up to 12 hours. You can read more here.
- We then define a Bayesian optimization sweep configuration that will search through different hyperparameters including:
- Learning rate (log uniform distribution between ~4.54e-5 and ~9.12e-4)
- Batch size (1, 2, or 4)
- Gradient accumulation steps (2, 4, or 8)
- LoRA parameters (r, alpha, and dropout)
- Maximum sequence length (512 or 1024)
- We create this sweep in the “Llama-3.2-Customer-Support” W&B project
- For each sweep iteration:
- We initialize a new W&B run
- Combines the sweep’s hyperparameters with fixed parameters (like model name and dataset)
- Sends the parameters to a Cerebrium endpoint for training that happens asynchronously.
- Logs the results back to W&B
- Run these combinations across 10 experiments (10 concurrent GPU’s is the limit on Cerebrium’s Hobby plan)
You can then run your script by running python run.py
in your CLI.
As the training is happening, you should see the results in your Wandb dashboard!
Next Steps
- You could copy the resulting model directly to your AWS S3 bucket using the Boto Package.
- You could kick off your CI/CD processes in order to test the outputs of the model to make sure it adheres to your requirements. Look into the Webhook functionality Cerebrium offers
- Once all experiments are finished running, you could create a Cerebrium deployment that serves inferencing requests for the model. That application could reload the model write from your Cerebrium volume.
- You could use the Cerebrium python package to download the model directly to your machine
Conclusion
Hyperparameter optimization is a powerful tool for fine-tuning machine learning models, and with the right setup, it doesn’t have to be overwhelming. By combining WandB for tracking and Cerebrium for scaling serverless compute, this tutorial demonstrated how you can efficiently run hyperparameter sweeps for Llama 3.2, ensuring you get the best performance with minimal effort.
You can look at at the final Github repository here