There are certain occasions when you want to store data in a pre or post-processing function that you will access in a further function further in the pipeline. For example, you may want to store the number of rows in a dataset in a pre-processing function and then use that number in a post-processing function to calculate a percentage. To do this, the Cerebrium framework provides a persistent memory store that can access through the use of the get, save and delete functions.

The persistent memory store is a key-value store that can be used to store data in a processing function and then retrieve that data in a further function in the pipeline. The persistent memory store is a global store, meaning that all functions in the pipeline can access the same data. However, while you can save any Python object, you may only save objects created within the scope of the function. This means that you cannot save objects created in a parent function, or created in a global scope of your local interpreter.

from cerebrium import deploy, model_type, save, get

# This function will be applied to the input of your model
def pre_process(data, files):
    import numpy as np
    from PIL import Image
    labelled = {d: Image.open(f) for d, f in zip(data, files)}
    labels = [d["name"] for d in data]
    save("labels", labels) # Save the labels for later
    return labelled

def post_process(result, input_data, files):
    import numpy as np
    output = np.argmax(result, axis=1)
    labels = get("labels") # Get the labels we saved earlier
    return {d: l for d, l in zip(labels, output)}

model_flow = [(model_type.TORCH, 'torch.pt'), (model_type.XGBOOST_CLASSIFIER, 'xgb.json', {"pre": pre_process, "post": post_process})]
endpoint = deploy(model_flow, 'my-flow', "<API_KEY>")