Figure 1 - A typical SAP Data Hub pipeline
In this blog, jens.rannacher also explores how to define custom parameters for the custom operator. In particular, he adds 3 simple text parameters that are then used in the Python script. He then shows how one can consume the value of the parameters that have been set in the graph configuration during the graph execution runtime. For example, if the custom operator has a custom parameter called customParam, you can read it's value in runtime by simply pointing to api.config.customParam in the Python script. The vflow api is very powerful and, in particular, it gives us the capacity to read any configured parameter through the api.config object.
However, what if we want to create a more complex parameter that is not just a simple string? Looking at several of the standard operators in SAP Data Hub, we can observe more complex types of parameters with radio buttons, drop-down boxes and even more complex ones, for example, the type of parameter used to select a connection from the connection manager (or enter the connection details manually). Below is a screenshot of this connection parameter.
Figure 2 - Connection Selection screen
Figure 3 - Create Operator screen
Tip: if you want to include your operator in a subfolder structure, you can enter the desired structure as a namespace in the Operator name (similar to a package in a Java class), e.g. an operator with the name com.mycompany.myoperator would be created with the subfolder structure /com/mycompany/myoperator/ under the /operators/ folder.
Figure 4 - Custom Operator - Ports
Tip: if your graph will process data (e.g. acquire data from a source and push it into a HANA table) prior to the ML model apply, add an input port - this input port will serve as the trigger for the ML model apply after the data has been loaded to the appropriate location expected by the custom operator. If the custom operator will just apply the model on top of an already existing HANA table, you don't need any input ports, just an output one. In case you might have both scenarios, leave the input port open - you can always add an input port later, during the graph design time.
Figure 5 - Custom Operator - Tags
Figure 6 - Custom Operator - Configuration
Figure 7 - Config Schema properties
Figure 8 - hanaConnection properties
Tip: if you are having difficulties entering a blankspace in the list of Values for the configurationType property, go into the JSON view (top right menu) and edit the "enum" attribute directly.
Tip: if you wish to make it faster, just copy the code below and replace the existing content of the JSON view. If you chose a different name for your Custom Operator, just copy and paste the content of the "properties" attribute below (leaving the "$schema", $id" and "type" attributes unaltered). When you switch back to the Form view, you should see all properties properly filled up.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"$id": "http://sap.com/vflow/gcoe.predict_mortgage_default.configSchema.json",
"type": "object",
"properties": {
"hanaConnection": {
"title": "HANA Connection",
"description": "HANA Connection",
"type": "object",
"properties": {
"configurationType": {
"title": "Configuration Type",
"description": "Configuration Type",
"type": "string",
"enum": [
" ",
"Configuration Manager",
"Manual"
]
},
"connectionID": {
"title": "Connection ID",
"description": "Connection ID",
"type": "string",
"format": "com.sap.dh.connection.id",
"sap_vflow_valuehelp": {
"url": "/app/datahub-app-connection/connections?connectionTypes=HANA_DB",
"valuepath": "id",
"displayStyle": "autocomplete"
},
"sap_vflow_constraints": {
"ui_visibility": [
{
"name": "configurationType",
"value": "Configuration Manager"
}
]
}
},
"connectionProperties": {
"title": "Connection Properties",
"description": "Connection Properties",
"$ref": "http://sap.com/vflow/com.sap.dh.connections.hana_db.schema.json",
"sap_vflow_constraints": {
"ui_visibility": [
{
"name": "configurationType",
"value": "Manual"
}
]
}
}
}
},
"hanaSchema": {
"title": "HANA Schema",
"description": "HANA Schema",
"type": "string"
},
"hanaModelTable": {
"title": "HANA Model Table",
"description": "HANA Table including trained model",
"type": "string"
},
"hanaApplyTable": {
"title": "HANA Apply Table",
"description": "HANA Table with data for apply",
"type": "string"
},
"hanaPredictTable": {
"title": "HANA Predict Table",
"description": "Target HANA Table for predicted data",
"type": "string"
},
"hanaSequence": {
"title": "HANA Sequence",
"description": "HANA Sequence (for APPLY_ID)",
"type": "string"
},
"codelanguage": {
"type": "string"
},
"script": {
"type": "string"
}
}
}
import hana_ml as hana_ml
from hana_ml.algorithms.pal import trees
from datetime import datetime
hanaConn = api.config.hanaConnection['connectionProperties']
conn = hana_ml.dataframe.ConnectionContext(hanaConn['host'], hanaConn['port'], hanaConn['user'], hanaConn['password'])
def apply():
# Retrieve model
df_model_saved = hana_ml.dataframe.DataFrame(conn, 'select * from ' + api.config.hanaSchema + '.' + api.config.hanaModelTable)
tree_reg_saved = trees.DecisionTreeRegressor(conn, algorithm='cart')
tree_reg_saved.model_ = df_model_saved.select('ROW_INDEX', 'MODEL_CONTENT')
# Create HANA dataframe for the table that holds the data for prediction
df_apply = conn.table(api.config.hanaApplyTable, schema=api.config.hanaSchema)
# Predict the probability of default
features = ['INCOME', "BOCREDITSCOR", "COBOCREDITSCOR", "LTV", "TERM", "RATE", "BOAGE", "COAGE"]
df_predict = tree_reg_saved.predict(df_apply, features=features, key="ASSIGNED_ID").select("ASSIGNED_ID", "SCORE").filter("SCORE > 0.5")
# Save dataframe to HANA table
table_name = api.config.hanaPredictTable + '_' + datetime.now().strftime('%Y%m%d%H%M%S')
df_predict.save((api.config.hanaSchema, table_name))
# Create SQL command for subsequent processing
sqlStmt = ('SELECT ' + api.config.hanaSchema + '.' + api.config.hanaSequence + '.nextval from DUMMY;' +
' INSERT INTO ' + api.config.hanaSchema + '.' + api.config.hanaPredictTable +
' SELECT ' + api.config.hanaSchema + '.' + api.config.hanaSequence + '.currval, * FROM ' + api.config.hanaSchema + '.' + table_name +
'; DROP TABLE ' + api.config.hanaSchema + '.' + table_name + ';')
api.send("output", sqlStmt)
api.add_generator(apply)
def shutdown():
conn.close()
api.add_shutdown_handler(shutdown)
Figure 9 - Predict Mortgage Default Graph
Figure 10 - Connection Selection screen
Figure 11 - Custom Operator Configuration view
Tip: the .fit() and .predict() methods of the PAL and APL classes of the hana_ml Python API accept both tables and views as the input dataframe. In our case, I have used a view as the input dataframe for the apply.
Tip: instead of creating the graph manually, you can also switch to the graph JSON view, copy and paste the JSON definition below and switch back to the Diagram view. Notice that you might need to edit the technical name of the Custom Operator, in case you named it something else (or had a different subfolder structure).
{
"properties": {},
"icon": "",
"iconsrc": "",
"description": "Predict Mortgage Default",
"processes": {
"graphterminator1": {
"component": "com.sap.util.graphTerminator",
"metadata": {
"label": "Graph Terminator",
"x": 480.99999809265137,
"y": 12,
"height": 80,
"width": 120,
"config": {}
}
},
"saphanaclient1": {
"component": "com.sap.hana.client2",
"metadata": {
"label": "SAP HANA Client",
"x": 311.99999809265137,
"y": 12,
"height": 80,
"width": 120,
"config": {
"connection": {
"connectionProperties": {
"additionalHosts": [],
"host": "host",
"password": "",
"port": 9000,
"useTLS": false,
"user": ""
},
"configurationType": "Configuration Manager",
"connectionID": "AWS_HANA24"
}
}
}
},
"tomessageconverter1": {
"component": "com.sap.util.toMessageConverter",
"metadata": {
"label": "ToMessage Converter",
"x": 196.99999904632568,
"y": 27,
"height": 50,
"width": 50,
"config": {}
}
},
"predictmortgagedefault1": {
"component": "gcoe.predict_mortgage_default",
"metadata": {
"label": "Predict Mortgage Default",
"x": 12,
"y": 12,
"height": 80,
"width": 120,
"extensible": true,
"config": {
"script": "import hana_ml as hana_ml\nfrom hana_ml.algorithms.pal import trees\nfrom datetime import datetime\n\nhanaConn = api.config.hanaConnection['connectionProperties']\nconn = hana_ml.dataframe.ConnectionContext(hanaConn['host'], hanaConn['port'], hanaConn['user'], hanaConn['password'])\n\ndef apply():\n \n # Retrieve model\n df_model_saved = hana_ml.dataframe.DataFrame(conn, 'select * from ' + api.config.hanaSchema + '.' + api.config.hanaModelTable)\n tree_reg_saved = trees.DecisionTreeRegressor(conn, algorithm='cart')\n tree_reg_saved.model_ = df_model_saved.select('ROW_INDEX', 'MODEL_CONTENT')\n\n # Create HANA dataframe for the table that holds the data for prediction\n df_apply = conn.table(api.config.hanaApplyTable, schema=api.config.hanaSchema)\n\n # Predict the probability of default\n features = ['INCOME', \"BOCREDITSCOR\", \"COBOCREDITSCOR\", \"LTV\", \"TERM\", \"RATE\", \"BOAGE\", \"COAGE\"]\n df_predict = tree_reg_saved.predict(df_apply, features=features, key=\"ASSIGNED_ID\").select(\"ASSIGNED_ID\", \"SCORE\").filter(\"SCORE > 0.5\")\n\n # Save dataframe to HANA table\n table_name = api.config.hanaPredictTable + '_' + datetime.now().strftime('%Y%m%d%H%M%S')\n df_predict.save((api.config.hanaSchema, table_name))\n\n # Create SQL command for subsequent processing\n sqlStmt = ('SELECT ' + api.config.hanaSchema + '.' + api.config.hanaSequence + '.nextval from DUMMY;' +\n ' INSERT INTO ' + api.config.hanaSchema + '.' + api.config.hanaPredictTable + \n ' SELECT ' + api.config.hanaSchema + '.' + api.config.hanaSequence + '.currval, * FROM ' + api.config.hanaSchema + '.' + table_name +\n '; DROP TABLE ' + api.config.hanaSchema + '.' + table_name + ';')\n api.send(\"output\", sqlStmt)\n\napi.add_generator(apply)\n\n\ndef shutdown():\n conn.close()\n\napi.add_shutdown_handler(shutdown)",
"hanaConnection": {
"configurationType": "Configuration Manager",
"connectionID": "AWS_HANA24",
"connectionProperties": {}
},
"hanaPredictTable": "DEFAULT_MORTGAGES",
"hanaSequence": "SEQ_APPLY_ID",
"hanaModelTable": "DEFAULT_LOAN_MODEL_REGTREE",
"hanaSchema": "TEST",
"hanaApplyTable": "V_APPLY_MORTGAGES",
"codelanguage": "python"
}
}
}
},
"groups": [],
"connections": [
{
"metadata": {
"points": "250.99999904632568,52 278.9999985694885,52 278.9999985694885,43 306.99999809265137,43"
},
"src": {
"port": "out",
"process": "tomessageconverter1"
},
"tgt": {
"port": "sql",
"process": "saphanaclient1"
}
},
{
"metadata": {
"points": "435.99999809265137,52 475.99999809265137,52"
},
"src": {
"port": "result",
"process": "saphanaclient1"
},
"tgt": {
"port": "stop",
"process": "graphterminator1"
}
},
{
"metadata": {
"points": "136,52 164,52 164,43 191.99999904632568,43"
},
"src": {
"port": "output",
"process": "predictmortgagedefault1"
},
"tgt": {
"port": "inbody",
"process": "tomessageconverter1"
}
}
],
"inports": {},
"outports": {}
}
Figure 12 - Prediction Results in HANA
var connectionObj = $.config.getObject("connection").connectionProperties;
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
24 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 | |
6 | |
5 | |
5 |