Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
xinchen
Product and Topic Expert
Product and Topic Expert

1. Introduction

Machine learning models, e.g. classification and regression model, contain the relationships and patterns between features in the training dataset which could be applied to similar data in the future for prediction. Model training could be tim e consuming, so it is desirable to store/persist a model for future use without retraining the model.

In  Python Machine Learning Client for SAP HANA(hana-ml), we provide a model storage class for model persistence, such as classification, regression, clustering and time series models. In this blog post, you will learn:

    • ModelStorage class and its methods in hana_ml.
    • How to apply model storage and its methods in a use case.

 

2. ModelStorage Class

ModelStorage class allows users to save, load, list and delete models. Internally, a model is stored as two parts:

    • Metadata: contains the model identification (name, version, algorithm class) and its python model object attributes required for re-instantiation. It is saved in a table named HANAML_MODEL_STORAGE by default.

Some important methods and descriptions are below:
1. save_model (model, if_exists='upgrade') :The model is stored in SAP HANA tables in a schema specified by the user. A model is identified by its name and version. Parameter ‘if_exists’ provides three ways for handing the model saving if a model with the same name and version already exists:

  • ‘replace’ :, the previous model will be overwritten.
  • ‘upgrade’: the current model will be saved with an incremented version number.
  • ‘fail’: an error message is thrown to indicate that the model with same name and version already exists.

2. load_model (name, version=None) :Load a model according to the model name. If the version is not provided, the latest version is loaded.

3. delete_model (name, version) :Delete a model according to the name and version.
4. list_models(name=None, version=None) :List all the existing models stored in the SAP HANA.
5. clean_up() :Delete all the models and the meta table in the SAP HANA.

Algorithms who has predict and transform functions are supported by Model Storage. A part of list of supported algorithms is as follows:
Classification: UnifiedClassification, MLPClassifier, RDTClassifier, HybridGradientBoostingClassifier, SVC, DecisionTreeClassifier, CRF, LogisticRegression, KNNClassifier, NaiveBayes…

Regression: UnifiedRegression, LinearRegression, PolynomialRegression, GLM, ExponentialRegression, BiVariateGeometricRegression, BiVariateNaturalLogarithmicRegression, CoxProportionalHazardModel…

Clustering: UnifiedClustering, DBSCAN, SOM……

Time Series: ARIMA, OnlineARIMA, VectorARIMA, AutoARIMA, lstm...

Preprocessing: Imputer, KBinsDiscretizer…


3. Use Case

All source code will use Python machine learning client for SAP HANA Machine Learning (hana-ml)

We firstly need to create a connection to a SAP HANA and then we could use various functions of hana-ml to do the data analysis. The following is an example:

>>> import hana_ml

>>> from hana_ml import dataframe

>>> conn = dataframe.ConnectionContext('sysName', 'port', 'username', 'password')


A simple self-made dataset is used to show the usage of model storage for classification. The data is stored in SAP HANA tables called DATA_TBL_FIT, DATA_TBL_PREDICT. Let's have a look at the dataset.

df_fit = conn.table('DATA_TBL_FIT')

df_predict = conn.table('DATA_TBL_PREDICT')

print(df_fit.collect())

print(df_predict.collect())


The result is shown below:

    ID   OUTLOOK  TEMP  HUMIDITY WINDY        CLASS

0    0     Sunny    75      70.0   Yes         Play

1    1     Sunny    77      90.0   Yes  Do not Play

2    2     Sunny    85      79.0    No  Do not Play

3    3     Sunny    72      95.0    No  Do not Play

4    4     Sunny    88      70.0    No         Play

5    5  Overcast    72      90.0   Yes         Play

6    6  Overcast    83      78.0    No         Play

7    7  Overcast    64      65.0   Yes         Play

8    8  Overcast    81      75.0    No         Play

9    9  Overcast    71      80.0   Yes  Do not Play

10  10      Rain    65      70.0   Yes  Do not Play

11  11      Rain    75      80.0    No         Play

12  12      Rain    68      80.0    No         Play

13  13      Rain    70      96.0    No         Play

   ID   OUTLOOK   TEMP  HUMIDITY WINDY

0   0  Overcast     75  -10000.0   Yes

1   1      Rain     78      70.0   Yes

2   2     Sunny -10000      78.0   Yes

3   3     Sunny     69      70.0   Yes

4   4      Rain     74      70.0   Yes

5   5      Rain     70      70.0   Yes

6   6       ***     70      70.0   Yes


Train the model with UnifiedClassification function and various algorithms 'MLP', 'NaiveBayes', 'LogisticRegression', 'decisiontree', 'HybridGradientBoostingTree', 'RandomDecisionTree','SVM':

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

from hana_ml.model_storage import ModelStorage



ms = ModelStorage(conn)

classification_algorithms = ['MLP', 'NaiveBayes', 'LogisticRegression', 'decisiontree',

                             'HybridGradientBoostingTree', 'RandomDecisionTree','SVM']



dt_param = dict(algorithm='c45')

mlp_param = dict(hidden_layer_size=(10,), activation='TANH', output_activation='TANH',

                 training_style='batch', max_iter=1000, normalization='z-transform',

                 weight_init='normal', thread_ratio=1)



for name in classification_algorithms:

    if name == 'decisiontree':

        algorithm = UnifiedClassification(func = name, **dt_param)

    elif name ==  'MLP':

        algorithm = UnifiedClassification(func = name, **mlp_param)

    else:

        algorithm = UnifiedClassification(func = name)



    if name == 'LogisticRegression':

        algorithm.fit(data=df_fit, key='ID', class_map0='Play', class_map1='Do not Play')

    else:

        algorithm.fit(data=df_fit, key='ID')



    algorithm.name = name

    algorithm.version = 1

    ms.save_model(model=algorithm, if_exists='replace')


Use list_models function to list all models and we could see all 6 models with name, version and other information are shown in a table:

ms.list_models()


The model list is shown below:


Let's select one model 'RandomDecisionTree' to load the model for prediction:

new_model = ms.load_model(name='SVM', version =1)

type(new_model)


output:

hana_ml.algorithms.pal.unified_classification.UnifiedClassification


The type of new_model is a object of Unifiedclassfication. we could use this object for prediction:

res = new_model.predict(df_predict, key='ID')

print(res.collect())


The result:

   ID        SCORE  CONFIDENCE REASON_CODE

0   0         Play    0.296441        None

1   1         Play    0.505984        None

2   2         Play    0.296441        None

3   3         Play    0.595937        None

4   4         Play    0.635761        None

5   5  Do not Play    0.248283        None

6   6  Do not Play    0.313294        None


For example, if we want to delete the model 'SVM':

ms.delete_model(name='SVM', version=1)

ms.list_models()


output:


We could also clean up all models at once:

ms.clean_up()

3. Summary

In this blog, we described what is model storage of hana-ml and how to use its methods. 

Other Useful Links:

  1. hana-ml on Pypi.
  2. Python Machine Learning Client for SAP HANA (hana-ml) Documentation
  3. R Machine Learning Client for SAP HANA (hana.ml.r) Documentation
  4. SAP HANA Predictive Analysis Library (PAL) Documentation
  5. Other blog posts on HANA Machine Learning:
1 Comment