Artificial Intelligence Forum
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

SAP Hana Python

Former Member
0 Likes
1,647

Hello,

I have a database on HANA and I would like to use machine learning on these datas.

I tried to connect python on SAP HANA to get better performances when running my python script . However, when I just try to do a head.collect() to get 1000 rows, it takes a lot of time to process.I get datas faster when i run in local.

Could you help me?

Thanks

1 ACCEPTED SOLUTION
Read only

henrique_pinto
Active Contributor
994

The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:

https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.dataframe.html#h...

You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?

View solution in original post

1 REPLY 1
Read only

henrique_pinto
Active Contributor
995

The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:

https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.dataframe.html#h...

You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?