cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

SAP HANA python API - machine learning use cases

Former Member
0 Likes
1,975

Hi,

what are the typical use cases when I can deveolp my models in the Jupyter notebook with data in SAP HANA?

I am asking because most of the machine learning use cases I know have data in formats like wav, txt, csv or stored in data lake e.g. Hadoop or streaming data from IOT sensors. HANA memory is very expensive so it would make no sense to load this data to HANA.

Do you know any use cases or similar scenarios? Any links?

BR

Robert

View Entire Topic
henrique_pinto
Active Contributor
0 Likes

Hi Robert,

you're right that typically if you're handling large volumes of data, you will not store that data in a permanent state in HANA. I mean, in theory with HANA NSE you even could consider the possibility, since data with NSE now sits on disk instead of memory (HANA in this case would behave like a regular disk-based, cache enabled DB), but then your argument would be that you would never store big data in a DB, specially for ML, and would be better served with a Data Lake, since you can then use things like Spark to distribute compute.

However, before being an in-memory DB, HANA is an In-Memory Compute Engine (fun fact, IMCE was one of the many early internal names for HANA). In essence, that means that you don't need to necessarily store the data in HANA's memory but you can (and should, where valid) leverage the HANA in-database compute engines for processing data in a fast manner, even if the data is not stored in HANA. I'm writing a blog I will publish soon about some tests I've done with NSE and also virtual tables with the hana_ml lib

Typically, the advantage of the in-memory compute of HANA is for the scoring in real time data from SAP applications. That way, we can bring the model into production in the actual business transactions apps in a easy way.