Arize
Setup
To get started, you need to create an Arize account which you can do here. Once you have signed up for Arize, you will need to create a ‘space’ - you can think of this a way to organize various models but usually one space per company/client is sufficient.
Arize handles the following type of models for you to do logging. You can read more about the functionality they provide for each to get a sense of their features and functionality:
- Binary Classification
- Multi-class Classification
- Regression
- Timeseries Forecasting (Coming soon)
- Ranking (Coming Soon)
- NLP
- CV
Logging with Arize is particularly easy with Cerebrium since you just have to define about 8 lines of code, and it follows an almost identical pattern to if you are familiar on how to log with Arize. If you would like to be more familiar with their platform you can take a look at the Arize documentation.
Adding a Arize Logger
Once you have instantiated your Conduit object in code (see here), you can add a Censius logger to it by calling the add_logger
method.
The Arize logger requires the following platform specific arguments:
api_key
: The unique key of our account which can be found in your Space Settingsspace_key
: The unique key related to the space you would like to deploy your model. This can be found in your Space Settings.model_type
: The type of Arize model you are using. You can find the different options herefeatures
: A list of the column names of your different features. Do not include outcome variable names - purely features.metrics_validation
: Optional. The validation metrics you would like to use to determine the performance of your modelclass_labels
: Optional. A list of potential labels for multi-class classification. Please ensure it correlates to the order that the model would returnembedding_features
: Optional. If you include embedding vectors as features for your model. You can read more hereschema
: This is the schema that Arize requires in order to determine what your predictions, features and embeddings are.
from cerebrium import Conduit, model_type, logging_platform
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
features=["pregnancies", "glucose", "bloodPressure", "skinThickness","insulin", "bmi", "diabetesPedigree", "age"]
schema = Schema(
prediction_id_column_name="prediction_id", #Do not change this
prediction_label_column_name="prediction", #Do not change this
feature_column_names=features,
tag_column_names=["pregnancies"]
)
platform_args = {
"model_type": ModelTypes.BINARY_CLASSIFICATION,
"schema": schema,
"space_key": "<YOUR_SPACE_KEY>",
"api_key": "<YOUR_API_KEY>",
"features": features,
"metrics_validation": [Metrics.CLASSIFICATION]
}
conduit.add_logger(
platform=logging_platform.ARIZE,
platform_authentication={"space_key": platform_args['space_key'], "api_key": platform_args['api_key']},
features=platform_args["features"],
targets=["outcome"],
platform_args=platform_args,
log_ms=True
)
You can now deploy your Conduit as normal! Your model will log predictions to Arize every time you call the endpoint.
conduit.deploy()
Logs sent to Arize aren’t available immediately as it takes Arize a few minutes to process them.
Important things to Note
Logging Actual Values
With every request made to your Cerebrium endpoint, a unique run_id
is returned. We use this run_id
to associate every log of the output of your model. You can use this run_id
to log the actual values to Arize and compare model performance to that of your baseline model.
Multi-class classification
When doing multi-class classification, its important that you keep the same order in your class labels that the model will return. Below is an example with the Iris dataset
platform_args = {
"model_type": ModelTypes.SCORE_CATEGORICAL,
"schema": schema,
"space_key": "sJlAcWf3",
"api_key": "IB31mRV6AMeiZkuV14zF",
"features": features,
"class_labels": ["Setosa", "Versicolour", "Virginica"]
}
If we pass the following input, [5.8, 2.7, 5.1, 1.9]
, to our XGBoost Classifier we get the following model output: [2]
. This would then correlate to the Virginica class label
Embedding vectors
If you are passing embedding features to your model you must make sure to not include it in the features array and define it separately.
features = ['state', 'city', 'merchant_name', 'pos_approved', 'item_count', 'merchant_type', 'charge_amount': 20.11]
embedding_features = {
"nlp_embedding": Embedding(
vector=pd.Series([4.0, 5.0, 6.0, 7.0]),
data="I really like the coffee",
),
}
platform_args = {
"model_type": ModelTypes.SCORE_CATEGORICAL,
"schema": schema,
"space_key": "sJlAcWf3",
"api_key": "IB31mRV6AMeiZkuV14zF",
"features": features,
"embedding_features": embedding_features
}
Subsequently, We assign values to your features in the order they were entered - so it’s important to make sure that the embedding values are the last in your model input. For example:
Input: ['ca', 'berkeley', 'Peets Coffee', True, 10, 'coffee shop' 20.11, [4.0, 5.0, 6.0, 7.0]]
In the above example, ‘ca’ is associated to state, ‘berkeley’ to city etc.