Artificial Intelligence and Machine Learning Discussions
Engage in AI and ML discussions. Collaborate on innovative solutions, explore SAP's AI innovations, and discuss use cases, challenges, and future possibilities.
cancel
Showing results for 
Search instead for 
Did you mean: 

SAP Hana Python

former_member618035
Discoverer
0 Kudos
990

Hello,

I have a database on HANA and I would like to use machine learning on these datas.

I tried to connect python on SAP HANA to get better performances when running my python script . However, when I just try to do a head.collect() to get 1000 rows, it takes a lot of time to process.I get datas faster when i run in local.

Could you help me?

Thanks

1 ACCEPTED SOLUTION

henrique_pinto
Active Contributor
337

The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:

https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.dataframe.html#h...

You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?

View solution in original post

1 REPLY 1

henrique_pinto
Active Contributor
338

The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:

https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.dataframe.html#h...

You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?