hana_ml 2.6 in the context of my demo used in this year's SAP TechEd's DAT108 session.hana_ml.hana_ml:SELECT statement backing the dataframe,collect() method,hana_ml you should have some basic understanding of the Pandas module.hana_ml if neededhana_ml 2.6 has been released since my previous post was published last week. I can see it using:pip search hanadocker exec hmlsandbox01 pip search hana!pip search hana, or
So, to upgrade the module let's run:
pip install --upgrade hana-ml
shapely modulehana_ml to support geospatial data manipulation, but must be separately installed manually to avoid errors like "name 'wkb' is not defined" or "ModuleNotFoundError: No module named 'shapely'." It is a known limitation and should be fixed in the next patch of hana_ml.shapely please follow: https://shapely.readthedocs.io/en/stable/project.html#installing-shapely.pip install shapely
01 Dataframes.ipynb.hana_ml will be used against some large volumes of data already stored in SAP HANA on-prem or in SAP HANA Cloud. But in our case of starting with the empty trial instance of SAP HANA Cloud, we need to load some data first. Actually, I showed already how to quickly load CSV files into SAP HANA in my post Quickly load data with hana_ml....import pandas
pandas.__version__dfp_ notation for Pandas dataframes.dfp_nodes=pandas.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-nodes.csv')
dfp_edges=pandas.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-edges.csv')print('Size of nodes dataframe: {}'.format(dfp_nodes.shape))
print('Size of edges dataframe: {}'.format(dfp_edges.shape))
dfp_nodes dataframes?dfp_nodes.dtypes~label, so what are the node labels?dfp_nodes.groupby('~label').size()
type:string) as well as some rows (like those labeled continet or version) that we do not need. Additionally, all columns have either some special characters (like ~) or data types (like :object) as part of their names that we do not need. Plus some of the columns have some data types too generic for their real content. And ideally, we need column names in all capitals for SAP HANA.dfp_ports and check it!dfp_ports=(
dfp_nodes[dfp_nodes['~label'].isin(['airport'])]
.drop(['~label','type:string','author:string','date:string'], axis=1)
.convert_dtypes()
)dfp_ports.columns=(dfp_ports.columns
.str.replace('~','')
.str.replace(':.*','')
.str.upper()
)
dfp_edges.dfp_edges.dtypesdfp_edges.groupby('~label').size()dfp_routes=dfp_edges[dfp_edges['~label'].isin(['route'])].drop(['~label'], axis=1).copy()dfp_routes.columns=dfp_routes.columns.str.replace('~','').str.replace(':.*','').str.upper()
HANAML database userHANAML created in the previous post let's switch to using it for further exercises.import hana_ml
hana_ml.__version__hana_cloud_endpoint="<uuid>.hana.trial-<region>.hanacloud.ondemand.com:443"hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=hana_ml.dataframe.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!', #Should be your user's password 😉
encrypt=True
)print(cchc.sql("SELECT SCHEMA_NAME, TABLE_NAME FROM TABLES WHERE SCHEMA_NAME='{schema_name}'"
.format(schema_name=cchc.get_current_schema()))
.collect()
)
HANAML does not have any tables yet. So, let's save the data from Pandas dataframes to SAP HANA tables using hana_ml.dfh_ports=hana_ml.dataframe.create_dataframe_from_pandas(cchc,
dfp_ports, "PORTS",
force=True
)dfh_routes=hana_ml.dataframe.create_dataframe_from_pandas(cchc,
dfp_routes, 'ROUTES',
force=True)dfh_ notation for HANA DataFrame variables.print(cchc.sql("SELECT SCHEMA_NAME, TABLE_NAME FROM TABLES WHERE SCHEMA_NAME='{schema_name}'"
.format(schema_name=cchc.get_current_schema()))
.collect()
)
collect() method of the HANA dataframe.print(dfh_ports.collect())
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 48 | |
| 45 | |
| 34 | |
| 33 | |
| 29 | |
| 25 | |
| 25 | |
| 24 | |
| 23 | |
| 22 |