Introduction

Now that your model has been successfully trained, you have two options for deploying to a production environment:

  • The first is to download the trained adapter weights using cerebrium download-model and write your own deployment code.
  • The second is to use the built-in auto-deployment feature of Cerebrium’s trainer.

While you have more flexibility writing custom deployment code for your finetuned weights, the second option is faster, easier and gets your model up and running in no time.
This way, your deployment occurs as soon as training is completed, providing you with a seamless way to test your model’s performance before considering any modifications to the deployment code.

Configuration of your Auto-deployment

While an auto-deployment will generate all the code needed to deploy your model, you’ll need to configure a few parameters to get it up and running. These parameters are added to the config.yml file of your training under the title autodeploy.

We’ve kept the configuration as simple as possible, using the same format and parameters that you’re used to in the config file of a Cortex deployment.

The following parameters are the parameters we suggest you supply for an auto-deployment:

autodeploy:
  name: your-auto-deployed-model-name
  hardware: AMPERE_A5000 # Hardware to use for the deployment
  cpu: 2 # Number of CPUs to allocate to the deployment
  memory: 14.5 # Memory to allocate to the deployment. This depends on your model but for most 7B models, 14.5GB is sufficient.

While we’ve tried to include all the requirements your model may need, you can overwrite our defaults by providing files for requirements-file and pkglist-file under the autodeployment parameters. This is particularly useful if you want to use a specific version of a library or find you want to include an extra package.

Feel free to include additional parameters such as min_replicas, max_replicas, or any other parameters typically used in the config of a Cortex deployment.

With these steps completed, all that’s left to do is run your cerebrium train command to upload your data and config. In no time your trained model will be up and running in a production environment, ready to make predictions!

Post Auto-Deployment Steps

Once your training is finished and the auto-deployment is executed, your auto-deployed model will be visible in your dashboard. To test your model, use the endpoint provided in the training logs or within the Example Code tab of the model’s dashboard.

You can obtain the deployment code from the builds tab of the model dashboard. Clicking the download button will fetch a zip file containing your code used for the auto-deployment. This is particularly useful if you intend to customize deployment parameters or functionalities.
Additionally, your model weights can be found in the downloaded zip file. If you plan to modify the deployment code, make sure to include the updated weights in the deployment folder.