Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
marc_daniau
Employee
Employee
We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.
from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'
hdf_train = hd.DataFrame(conn, sql_cmd)

First, we train a gradient boosting classification model with the interaction parameter set to true:
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True,
interactions=True)
apl_model.fit(hdf_train, label='class', key='id')

When the model training is completed, we ask for the report:
from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(apl_model).build().display()

You may want to generate the report as an HTML file:
apl_model.generate_html_report('APL_Census')

The usual "Variable Importance" tab provides a global explanation of the predictive model.


But because we explicitly requested the interactions when setting the model parameters, a new tab "Interaction Matrix" appears at the end:


On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6x6. For a larger matrix, 9x9 for example, we must specify a maximum number as follows:
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 
interactions=True,
interactions_max_kept=8)
apl_model.fit(hdf_train, label='class', key='id')


The larger the matrix, the longer it takes to fit the model.

If needed, one can obtain the interaction values in a pandas dataframe:
df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()
df.style.hide(axis='index')


These figures are computed using the Shapley Taylor index.

 

To know more about APL

 
1 Comment