Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Fukuhara
Product and Topic Expert
Product and Topic Expert

I am writing this blog to show basic model management using python package hana_ml.  Wtih class ModelStorage, you can save and load models.  Besides, I show State Enabled Real-Time Scoring Functions for faster prediction process.



Environment


Environment is as below.

  • Python: 3.7.13(Google Colaboratory)

  • HANA: Cloud Edition 2022.16


Python packages and their versions.

  • hana_ml: 2.13.22072200

  • pandas: 1.3.5

  • scikit-learn: 1.0.2


As for HANA Cloud, I activated scriptserver and created my users.  Though I don't recognize other special configurations, I may miss something since our HANA Cloud was created long time before.

I didn't use HDI here to make environment simple.

Python Script


Pre-requisites


Please see another article "Python hana_ml: PAL Classification Training(UnifiedClassification)" for training process.  From step1 "Install Python packages" to step 8 "Training" are exactly same code.  Step 9, 10 and 11 are unnecessary for this article.

9. Import modules


Import other python package modules as additional.
import pprint

from hana_ml.model_storage import ModelStorage

10. Save model


Just save model with class "ModelStorage" and function "save_model".
ms = ModelStorage(conn)

uc_rdt.name = 'Random Forest'
ms.save_model(model=uc_rdt, if_exists='replace')

Model metadata is stored in table "HANAML_MODEL_STORAGE", so the both below result are same.
display(ms.list_models())
display(conn.table('HANAML_MODEL_STORAGE').collect())


Let's look into the contents deeply.
pprint.pprint(ms.list_models().to_dict())

Though model metadata is stored in table "HANAML_MODEL_STORAGE", model contents and other data are saved in tables under "JSON -> artifacts", which are up to algorithm.  Help doc says as below.
The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.

{'CLASS': {0: 'hana_ml.algorithms.pal.unified_classification.UnifiedClassification'},
'JSON': {0: '{"model_attributes": {"func": "RandomDecisionTree", '
'"multi_class": null, "massive": false, "group_params": null, '
'"kwargs": {"n_estimators": 10, "max_depth": 10}}, "fit_params": '
'{"key": "ID", "features": null, "label": null, "group_key": '
'null, "group_params": null, "purpose": null, "partition_method": '
'"stratified", "stratified_column": "CLASS", '
'"partition_random_state": null, "training_percent": 0.8, '
'"training_size": null, "ntiles": 2, "categorical_variable": '
'null, "output_partition_result": null, "background_size": null, '
'"background_random_state": null, "build_report": true, "impute": '
'false, "strategy": null, "strategy_by_col": null, "als_factors": '
'null, "als_lambda": null, "als_maxit": null, "als_randomstate": '
'null, "als_exit_threshold": null, "als_exit_interval": null, '
'"als_linsolver": null, "als_cg_maxit": null, "als_centering": '
'null, "als_scaling": null, "kwargs": {}}, "artifacts": '
'{"schema": "I348221", "model_tables": '
'["HANAML_RANDOM_FOREST_2_MODELS_0", '
'"HANAML_RANDOM_FOREST_2_MODELS_1", '
'"HANAML_RANDOM_FOREST_2_MODELS_2", '
'"HANAML_RANDOM_FOREST_2_MODELS_3", '
'"HANAML_RANDOM_FOREST_2_MODELS_4", '
'"HANAML_RANDOM_FOREST_2_MODELS_5"], "library": "PAL"}, '
'"pal_meta": {"_fit_param": [["FUNCTION", "RDT", "string"], '
'["KEY", 1, "integer"], ["N_ESTIMATORS", 10, "integer"], '
'["MAX_DEPTH", 10, "integer"], ["PARTITION_METHOD", 2, '
'"integer"], ["PARTITION_STRATIFIED_VARIABLE", "CLASS", '
'"string"], ["PARTITION_TRAINING_PERCENT", 0.8, "float"], '
'["NTILES", 2, "integer"], ["HANDLE_MISSING_VALUE", 0, '
'"integer"], ["CATEGORICAL_VARIABLE", "CLASS", "string"]], '
'"fit_data_struct": {"ID": "INT", "X1": "DOUBLE", "X2": "DOUBLE", '
'"X3": "DOUBLE", "CLASS": "INT"}, "label": "CLASS"}}'},
'LIBRARY': {0: 'PAL'},
'MODEL_STORAGE_VER': {0: 1},
'NAME': {0: 'Random Forest'},
'SCHEDULE': {0: '{"schedule": {"status": "inactive", "schedule_time": "every '
'1 hours", "pid": null, "client": null, "connection": '
'{"userkey": "your_userkey", "encrypt": "false", '
'"sslValidateCertificate": "true"}, "hana_ml_obj": '
'"hana_ml.algorithms.pal.xxx", "init_params": {}, '
'"fit_params": {}, "training_dataset_select_statement": '
'"SELECT * FROM YOUR_TABLE"}}'},
'STORAGE_TYPE': {0: 'default'},
'TIMESTAMP': {0: Timestamp('2022-09-07 06:54:10')},
'VERSION': {0: 2}}

 

11. Load model


Now, just load model with function "load_model".  create_model_state is for State Enabled Real-Time Scoring Functions.
saved_model = ms.load_model(name='Random Forest')
saved_model.create_model_state()

12. Predict with loaded model


Just call "predict" function for prediction.
df_pred = saved_model.predict(test, key='ID')
print(df_pred.collect())

        ID SCORE  CONFIDENCE  \
0 9 0 1.0
1 13 1 1.0
2 14 0 1.0
3 16 1 0.8
4 20 0 1.0
... ... ... ...
1995 9988 1 1.0
1996 9990 0 1.0
1997 9996 1 1.0
1998 9998 0 0.8
1999 9999 1 1.0

REASON_CODE
0 [{"attr":"X2","pct":81.0,"val":-0.350732473499...
1 [{"attr":"X2","pct":89.0,"val":-0.546387864002...
2 [{"attr":"X2","pct":82.0,"val":-0.367046185280...
3 [{"attr":"X2","pct":76.0,"val":-0.221394522848...
4 [{"attr":"X2","pct":88.0,"val":-0.470017154574...
... ...
1995 [{"attr":"X2","pct":90.0,"val":-0.490175736690...
1996 [{"attr":"X2","pct":71.0,"val":-0.333635163456...
1997 [{"attr":"X2","pct":94.0,"val":-0.510854084253...
1998 [{"attr":"X2","pct":48.0,"val":-0.140319048941...
1999 [{"attr":"X2","pct":97.0,"val":-0.498180631259...

[2000 rows x 4 columns]

13. Delete model state and close connection


Delete model state and close HANA connection.  If you are testing and don't need all models anymore, then clean_up function delete all models.
saved_model.delete_model_state()
#ms.clean_up()
conn.close()