hana_ml 2.6, which I used as well in my demo in SAP TechEd’s DAT108 session.hana_ml 2.6 release several HANA DataFrame methods have been extended to support the processing of (geo-)spatial columns. And what is the better time to discuss this if not during Geography Awareness Week?02 Spatial.ipynb.import pandas as pd
from hana_ml import dataframe as dfhimport hana_ml, but aliases:pd for Pandas, anddfh for HANA DataFrame class.hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=dfh.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!',
encrypt=True
)cchc.connection.isconnected()
mylat and mylon. In my case:mylat, mylon = (51.1, 16.9)https://ifconfig.co service used in the earlier post Allow connections to SAP HANA Cloud instance from selected IP addresses — using the command line.import requests, json
s = requests.Session()
mylat, mylon = [json.loads
(
s.get("https://ifconfig.co/json").text
)
.get(key) for key in ['latitude', 'longitude']
]
s.close()print the coordinates we got.print(mylat, mylon)
mylat and mylon properly set, we can now build a HANA DataFrame to query the nearest airport.dfh_nearport=(cchc
.table("PORTS")
.select("CODE", "DESC",
('New ST_Point("LON", "LAT").ST_SRID(4326)', "PORT_LOC"),
(
'New ST_POINT({lon}, {lat}).ST_SRID(4326).ST_Distance(New ST_Point("LON", "LAT").ST_SRID(4326))'
.format(lon=mylon, lat=mylat)
, "DIST_FROMME"
)
)
.sort("DIST_FROMME").
head(1)
)PORTS two existing columns CODE, DESC plus two calculated columns:PORT_LOC with the spatial location of an airportDIST_FROMME with the distance in meters (that's the default unit of distance) from my location to an airport using the spatial reference ID SRID 4326.SELECT statement.dfh_nearport.select_statement
collect() are called.dfp_nearport=dfh_nearport.collect()
dfp_nearport
WRO (as expected ?), located about 5.8 kilometers from me. But what is that array of bytes [1, 1, 0, 0, 0, 0, 0, 0, 192, 195, 226, 48, 64... as the location?hana_ml 2.6 have been extended with the geo_cols and srid optional properties, like sql().dfp_nearport=(cchc
.sql(
'''SELECT TOP 1
CODE, DESC,
New ST_Point(LON, LAT).ST_SRID(4326) as "PORT_LOC",
New ST_POINT(16.9599, 51.0791).ST_SRID(4326).ST_Distance(New ST_Point(LON, LAT).ST_SRID(4326)) as "DIST_FROMME"
FROM "PORTS"
ORDER BY "DIST_FROMME" ASC
'''.format(mylon, mylat),
geo_cols=["PORT_LOC"],
srid=4326
)
.collect())
display(dfp_nearport)
POINT (16.88579940795898 51.10269927978516) that can be copied and pasted e.g. in http://geojson.io/ -> Meta -> Load WKT String to be displayed on the map.
geo_cols with HANA DataFrame.select_statement into SQL with geo_cols, like:dfp_nearport=(cchc
.sql(dfh_nearport.select_statement,
geo_cols={"PORT_LOC": 4326})
.collect()
)
display(dfp_nearport)
geo_cols syntax. It can be:geo_cols={"PORT_LOC": 4326},geo_cols=["PORT_LOC"], srid=4326. In this example, it is a list of one element, but could be more.LAT and LON columns on the SAP HANA table PORT. It would be more convenient to create a column with this ST_GEOMETRY(4326) data type already at the data persistence.dfp_ports, the same as we did in 01 Dataframes.ipynb once again in the current Jupyter notebook.dfp_nodes=pd.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-nodes.csv')
dfp_ports=(
dfp_nodes[dfp_nodes['~label'].isin(['airport'])]
.drop(['~label','type:string','author:string','date:string'], axis=1)
.convert_dtypes()
)
dfp_ports.columns=(dfp_ports.columns
.str.replace('~','')
.str.replace(':.*','')
.str.upper()
)PORTS thanks to force=True property) HANA DataFrame, indicating that a pair f columns LON and LAT are X and Y coordinates of a spatial point with Spatial Reference ID 4326.dfh_ports=dfh.create_dataframe_from_pandas(cchc,
dfp_ports, "PORTS",
geo_cols=[("LON", "LAT")],
srid=4326,
force=True
)POINT_LON_LAT_GEO column, as seen by running a command:dfh_ports.columns

hana_ml 2.6 is the ability to load Esri shapefiles, which is a very popular format to exchange geospatial data. I was very excited when saw this added functionality! I hope you are (or will be) too.Shapes and download the zipped file into that subfolder. This time I will do this right from my notebook's cell:!mkdir -p ./Shapes
!wget https://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip -O ./Shapes/TM_WORLD_BORDERS-0.3.zip 
create_dataframe_from_shapefile() method to persist the data in the SAP HANA table TM_WORLD_BORDERS. You can notice the use of geo_cols and srid here as well.dfh_countries = dfh.create_dataframe_from_shapefile(
connection_context=cchc,
shp_file='./Shapes/TM_WORLD_BORDERS-0.3.zip',
table_name="TM_WORLD_BORDERS",
geo_cols=["SHAPE"],
srid=4326
)Poland in my case:dfh_countries.select("ISO2", "NAME", "SHAPE").filter("NAME='Poland'").collect()
ST_CoveredBy(). Please note the use of the alias() method to build the join of two HANA DataFrames dfh_ports and dfh_countries.(
dfh_ports.alias("P")
.join
(
dfh_countries.filter("NAME='Poland'").alias("C"),
condition='"P"."POINT_LON_LAT_GEO".ST_CoveredBy("C"."SHAPE")=1',
select=["P.*"]
)
).count()
COUNTRY in the PORTS table, but I hope you get the point of this last overengineering effort ?hana-ml-samples repository: https://github.com/SAP-samples/hana-ml-samples/blob/main/Python-API/usecase-examples/multimodel-anal...."PUBLIC"."ST_UNITS_OF_MEASURE".ST_Distance().You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 46 | |
| 41 | |
| 38 | |
| 31 | |
| 28 | |
| 28 | |
| 27 | |
| 24 | |
| 24 | |
| 23 |