Test locally
To run your model locally and ensure it is working as intended before
deploying, you can use the load
and run
command. The load
command will
load your pipeline into memory using the paths you specified in deploy
from a base path, while run
will sequentially execute your loaded pipeline.
from cerebrium import deploy, model_type
conduit = deploy(('<MODEL_TYPE>', '<MODEL_FILE>'), '<MODEL_NAME>', '<API_KEY>')
conduit.load('./')
conduit.run(data)
Where data
is the data you would send to your model. This would usually be
some numerical 2D/3D array for typical models or a list of strings for a language model. You may
feed an ndarray
or Tensor
directly into this function. However, if you are
using a custom data pipeline that is expecting another type, you may need to convert your data into the
appropriate format for your model.
You can also define a Conduit object directly by using the Conduit
class. Then call the run
method on the Conduit object to test the model locally, or the deploy
method to deploy the Conduit’s model flow to Cerebrium.
When you use the Conduit object directly, you can specify what hardware you wish your model to run on by using the hardware
parameter in the Conduit
constructor.
The hardware
parameter is an enum that can be one of the following:
- hardware.CPU: This will run your model on a CPU. This is the default option for SKLearn, XGBoost, and SpaCy models.
- hardware.GPU: (Deprecated) This will run your model on a T4 GPU. This is the default option for Torch, ONNX, and HuggingFace models.
- hardware.A10: (Deprecated) This will run your model on an A10 GPU, which provides 24GB of VRAM. You should use this option if you are using a model that is too large to fit on the 16GB of VRAM that a T4 GPU provides. This will include most large HuggingFace models.
- hardware.TURING_4000 : A 8GB GPU that is great for lightweight models with less than 3B parameters in FP16.
- hardware.TURING_5000 : A 16GB GPU that is great for small models with less than 7B parameters in FP16. Most small HuggingFace models can run on this.
- hardware.AMPERE_A4000 : A 16GB GPU that is great for small models with less than 7B parameters in FP16. Significantly faster than an RTX 4000. Most small HuggingFace models can run on this.
- hardware.AMPERE_A5000 : A 24GB GPU that is great for medium models with less than 10B parameters in FP16. A great option for almost all HuggingFace models.
- hardware.AMPERE_A6000 : A 48GB GPU offering a great cost to performance ratio. This is great for medium models with less than 21B parameters in FP16. A great option for almost all HuggingFace models.
- hardware.A100 : A 80GB GPU offering some of the highest performance available. This is great for large models with less than 18B parameters in FP16. A great option for almost all HuggingFace models especially if inference speed is your priority.
from cerebrium import Conduit, model_type, hardware
conduit = Conduit(
'<MODEL_NAME>',
'<API_KEY>',
[('<MODEL_TYPE>', '<MODEL_FILE>')],
hardware=hardware.<HARDWARE_TYPE>
)
conduit.load('./')
conduit.run(data)
conduit.deploy()
Additionally, defining a conduit object directly allows you to add more models to your flow dynamically using add_model
method.
conduit.add_model('<MODEL_TYPE>', '<MODEL_FILE>', {<PROCESSING_FUNCTIONS>})