Export the scoring equation of a model using the S...

marc_daniau · ‎02-21-2018

This is a follow-up of a previous post: Debrief a model.

We have reached the step where our classification model shows enough accuracy and robustness to be used in production. One way of deploying our model is to export its scoring equation and hand over the code (e.g. Java, C++, SQL) to a member of the IT team who will test that code and move it to the production environment.

In this article, you will see how to generate the scoring equation of a trained model using the Python API of SAP Predictive Analytics inside a Jupyter notebook.

Loading the Model

We start with a blank notebook.

We run the below code to import the Automated Analytics library and load the model.

import sys

sys.path.append(r"C:\Program Files\SAP Predictive Analytics\Desktop\Automated\EXE\Clients\Python35")

import os

os.environ['PATH'] = r"C:\Program Files\SAP Predictive Analytics\Desktop\Automated\EXE\Clients\CPP"



AA_DIRECTORY = "C:\Program Files\SAP Predictive Analytics\Desktop\Automated"



import aalib



class DefaultContext(aalib.IKxenContext):

    def __init__(self): 

        super().__init__()



    def userMessage(self, iSource, iMessage, iLevel):

        print(iMessage)

        return True



    def userConfirm(self, iSource, iPrompt):

        pass



    def userAskOne(iSource, iPrompt, iHidden):

        pass



    def stopCallBack(iSource):

        pass



frontend = aalib.KxFrontEnd([])

factory = frontend.getFactory()

context = DefaultContext()



factory.setConfiguration("DefaultMessages", "true")

config_store = factory.createStore("Kxen.FileStore")

config_store.setContext(context, 'en', 10, False)

config_store.openStore(AA_DIRECTORY + "\EXE\Clients\CPP", "", "")

config_store.loadAdditionnalConfig("KxShell.cfg")



# LOAD Model

folder_name = r"O:\MODULES_PA/PYTHON_API/MY_MODELS"

model_name = "My Classification Model"

store = factory.createStore("Kxen.FileStore")

store.openStore(folder_name, "", "")

model = store.restoreLastModelD(model_name)

Generating the Java Code

The method described below allows us to generate the scoring equation in Java.

help(model.generateCode)

We specify the java file name and the folder where to write that file. Then we generate the code.

output_file = "MY_CLASSIF_EQUATION.java"

output_folder = r"O:\MODULES_PA/PYTHON_API/MY_MODELS"

model.generateCode("JAVA", output_folder, output_file)

We took Java as an example. All the code types are listed in the Developer Guide.

Generating the SQL code

Another code generation method exists that is better suited for SQL: model.generateCode2.

Here is the description of that method with its set of parameters.

help(model.generateCode2)

To illustrate how the SQL code generation works, we will switch to another model built on insurance claims data for fraud detection. In the configuration part, just before training, the claim unique identifier was specified as a key using the code below.

# Set the Key and Target Columns

target_col = "IS_FRAUD"

key_col = "CLAIM_ID"

# Set the Roles

model.getParameter("")

variables = model.getParameter("Protocols/Default/Variables")

variables.setAllValues("Role", "input")

variables.setSubValue(target_col + "/Role", "target")

variables.setSubValue("KxIndex/Role", "skip")

variables.setSubValue("KxIndex/KeyLevel", "0")

variables.setSubValue(key_col + "/Role", "skip")

variables.setSubValue(key_col + "/KeyLevel", "1")

model.validateParameter()

We have loaded the fraud detection model using the method: store.restoreLastModelD.

Now, we generate the SQL code specific to SAP HANA.

output_folder = r"O:\MODULES_PA/PYTHON_API/MY_MODELS"

output_file = "DETECT_FRAUD_EQUATION.sql"

table_name = "NEW_CLAIMS"

key_col = "CLAIM_ID"

model.generateCode2("HANA", output_folder, output_file, "", table_name, key_col)

That SQL code when executed returns two columns: the claim id and the prediction score. The user can modify the SQL statement based on the business needs: e.g. Sort the results in descending order of score and restrict the list to the top n claims.

To obtain additional information like the Yes/No fraud prediction, we must activate the advanced apply settings mode and specify the columns to be added after the key and the score.

t = model.getTransformInProtocol("Default", 0)

t.getParameter("")

t.changeParameter("Parameters/ExtraMode", "Advanced Apply Settings")

t.validateParameter()

# Get Long Names For Output Columns

model.getParameter("")

model.changeParameter("Parameters/CodeGeneration/UseVarNameAsAlias", "true")

model.validateParameter()

# Prepare Settings Parameter

target_col = "is_fraud"

d_path = "Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/%s" % target_col

settings = model.getParameter(d_path)

# Decision

flag = settings.getSubParameter("PredictedRankCategories")

flag.removeAll()

flag.insert("1")

# Probability Decision

flag = settings.getSubParameter("PredictedRankProbabilities")

flag.removeAll()

flag.insert("1") 

model.validateParameter()

We could also request the individual contributions of each predictor.

# Individual Contributions

t.getParameter("")

d_path = "Parameters/ApplySettings/Supervised/%s/Contribution" % target_col

t.changeParameter(d_path, "all")

t.validateParameter()

We generate the SQL file.

output_file = "DETECT_FRAUD_EQUATION_ADV.sql"

model.generateCode2("HANA", output_folder, output_file, "", table_name, key_col)

Another way to deploy a model is to directly make predictions and store the predicted values in a file or a table. This will be the subject of a new article.

Export the scoring equation of a model using the SAP Predictive Analytics Python API

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win