Pre & Post Processing Functions
In many cases, you may want to process the data in your single or multi-model flow before returning it to the next flow stage or to the user. To do this, you can supply a both a pre-processing and a post-processing function to your flow, pre
and post
respectively.
The pre-processing will be applied to the input of the model, while the post-processing function will be applied to the output of your model before it is returned to the next flow stage.
You will need to implement both of these functions if you wish to use binary files along with your input data.
In particular:
- The pre-processing function should take 1-2 arguments in order: the input to your model and any input files as binary data. It should return a single value, which is the pre-processed input to your model. It is crucial that the output of your pre-processing function is the same shape and type as the input of the model in the flow stage.
- The post-processing function should take a 1-3 arguments in order: the output of your model, the original input data and the original binary input files. It should return a single value, which is the post-processed output of your model. You should ensure the return shape and type is the same as the input shape and type of the next flow stage. If you are at the end of your flow, you should ensure the return type is one of [list, dict, numpy.array, torch.tensor]. Note if you wish to use files, you should ensure your function signature has 3 arguments, regardless of whether you use the second argument.
For example, if you have a model that outputs a 1D vector of probabilities, you may want to return the argmax. You can do this by supplying the following function to the deploy
function with the last flow stage:
# This function will be applied to the input of your model
def pre_process(data, files):
# data is a list of input data, files is a list of binary objects
import numpy as np
from PIL import Image
labelled = {d: Image.open(f) for d, f in zip(data, files)}
return labelled
def post_process(result, input_data, files):
# result is the output of your model, input_data is the original input data from the 1st flow stage, files is the original binary input files
import numpy as np
labels = np.argmax(result, axis=1)
return {d: l for d, l in zip(input_data, labels)}
model_flow = [(model_type.TORCH, 'torch.pt'), (model_type.XGBOOST_CLASSIFIER, 'xgb.json', {"pre": pre_process, "post": post_process})]
endpoint = deploy(model_flow, 'my-flow', "<API_KEY>")
If you’d like to use objects across multiple functions, consider using our persistent memory store feature.
You may use functions inside your processor from the following libraries:
numpy
scikit-learn
torch
pandas
Pillow
transformers
chitra
We are expanding this list of libraries, so if you have a specific library you would like to use, please let us know!
Note that any imports you need to do in your processing functions must be done inside the function. This is because the function is serialized and sent to the Cerebrium servers, and the imports will not be available on the server.