on ‎2019 Jul 05 11:56 AM
Hi,
what are the typical use cases when I can deveolp my models in the Jupyter notebook with data in SAP HANA?
I am asking because most of the machine learning use cases I know have data in formats like wav, txt, csv or stored in data lake e.g. Hadoop or streaming data from IOT sensors. HANA memory is very expensive so it would make no sense to load this data to HANA.
Do you know any use cases or similar scenarios? Any links?
BR
Robert
Request clarification before answering.
Hi mount_bertl
The SAP HANA Python API brings 2 major components, one is the SAP HANA DataFrame and the other is the access to the APL & PAL algorithm wrappers.
The SAP HANA DataFrame gives you access to your SAP HANA data and run transformation in the database instead of locally, you can apply transformations, aggregation and other functions at the database level instead of locally.
You can also collect the data use it like any Pandas data frame in the end with your preferred visualization or ML libraries.
And with the second, you can get access to the SAP HANA libraries for Machine Learning. SAP HANA provides access to 90+ "industry" standard algorithms like Linear Regression, K-mean, Apriori etc. but also to the Automated algorithm from KXEN (APL).
Not all algorithms have been wrapped in Python yet, but that's the ambition!
For the list of algorithms available from PAL please check: https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.algorithms.pal.h...
For the list of algorithms available from PAL please check: https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.algorithms.apl.h...
You can also check arun.godwin.patel blog series about the SAP HANA Python library:
- https://blogs.sap.com/2018/12/17/diving-into-the-hana-dataframe-python-integration-part-1/
- https://blogs.sap.com/2019/01/28/diving-into-the-hana-dataframe-python-integration-part-2/
You can also consider using SAP HANA, express edition which use a free developer license up to 32 GB of RAM. I personally ran some test loading csv files, and turned out that some of my 4GB of data files was loaded into a couple of hundred MB.
From what I remember, SAP HANA, express edition allows you to use SAP HANA streaming capabilities (to be confirmed however).
And last but not least with SAP HANA, express edition is that you can get the binary and install where ever you want or download a pre-built VM (assuming your host meet the minimum system requirement in both cases) or spin a new instance on AWS, Google Cloud or Microsoft Azure (the order here is just alphabetical, no preference is represented here ;-)).
Hope this helps you see better the benefits.
And off course this is definitely open to discussion
@bdel
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 13 | |
| 9 | |
| 7 | |
| 5 | |
| 4 | |
| 3 | |
| 2 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.