
hana_ml
package).Some people were confused with the visualization on the map at the end -- please note that this article focuses on technical use case connecting different components, not on doing coronavirus data deep analysis.
myjupyter01
running. I am connected to the Jupyter UI as described in the previous blog.hana_ml
jupyter/minimal-notebook
. It contains already some popular data processing packages, like pandas
.hana_ml
, which -- in its current version 1.0.8 -- is available on PyPI repository: https://pypi.org/project/hana-ml/.python -m pip install hana_ml
, but because I am running it from Jupyter notebook with Python3 kernel, I need to run it with !
at the beginning:!python -m pip install hana_ml
pandas
to import files with dataconfirmed
, deaths
, recovered
) from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series as Ferry used in his example.import hana_ml, pandas
# Links updated on 2020-03-22
df_confd = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df_death = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
df_recvd = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
#Links from before March 22nd
#df_confd = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv')
#df_death = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')
#df_recvd = pandas.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv')
pandas
to re-format the data frame3/10/20
to Confirmed
).df_confd_latest=df_confd.drop(df_confd.columns[4:len(df_confd.columns)-1], axis='columns')
df_confd_latest.columns = [*df_confd_latest.columns[:-1],'Confirmed']
hana_ml
to persist data in SAP HANA tablehanaml
that already exists there...cc=hana_ml.dataframe.ConnectionContext('12.34.567.890', 39015, 'hanaml', 'MyPasswordReusedEverywhere')
df_confd_latest
into a HANA dataframe hdf_confd
.hdf_confd=hana_ml.dataframe.create_dataframe_from_pandas(cc, df_confd_latest, 'df_confd', force=True)
hdf_confd
in Python does not store any data in your laptop, but only points to a table HANAML.df_confd
in SAP HANA server memory, and all Python operations on the HANA dataframe are physically exected in HANA db without moving data between the server and a client,collect()
method to convert HANA dataframe to Pandas (and as a result to bring data from HANA db server to the local client).df_confd
in the schema HANAML
with all the data from the source Pandas dataframe.SELECT NEW ST_POINT("Long", "Lat"), "Country/Region", "Province/State", "Confirmed" FROM HANAML."df_confd";
EPSG:4326
to get these points on the map. And DBeaver shows me the rest of the record data when I click on any point.hana_ml
...You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
18 | |
13 | |
11 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 |