2019 Jul 05 2:45 PM
Hello,
I have a database on HANA and I would like to use machine learning on these datas.
I tried to connect python on SAP HANA to get better performances when running my python script . However, when I just try to do a head.collect() to get 1000 rows, it takes a lot of time to process.I get datas faster when i run in local.
Could you help me?
Thanks
2019 Jul 05 6:46 PM
The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:
You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?
2019 Jul 05 6:46 PM
The way HANA Dataframes are supposed to work is to push the calculation down instead of bringing the data up to the python runtime. You can do any hana_ml supported modifications with the data:
You can do smaller .head(5).collect() statements to get some of the data points back for visualization purposes, but you should not be executing .collect() statements to get all the data set back to Jupyter/Python, since you want to leverage the HANA in-database capabilities. That being said, a 1,000-row select shouldn't take that long (it's translated to a select top 1000, i suppose). Maybe check your network settings? Are the python runtime and HANA in the same network, or going thru multiple VPN/bridging hops?