In
part 1 we requested and loaded Location History from a personal account in Google. There are too many single points (recorded locations) to efficiently visualize collected data. Let's have a look at the data using
geospatial clustering available in
SAP HANA 2.0 SPS 03.
Let's continue with exercises.
Exercise 2: K-means clustering
K-means tries to find an assignment of points to clusters, so that the sum of squared distances of the points to the center of the cluster they belong to is minimal.
Let's have a look at high-level clusters of all my locations for the complete history.
--KMeans
select ST_UnionAggr("ConvexHull") from
(select ST_CLUSTERID() AS "CID",
ST_CONVEXHULLAGGR("POINT").st_srid(4326) AS "ConvexHull",
COUNT(*) AS "Number of points in this cluster"
from "GOOGLE"."LOC_HISTORY"
GROUP CLUSTER BY "POINT"
USING KMEANS CLUSTERS 3
order by 3 desc);
Points have been roughly clustered into Central and Western Europe, Eastern Europe and Asia, the USA (using
GeoJson.io to visualize the SQL output).
Exercise 3: Grid clustering
Grid clustering provides a quick and easy way to use clustering. It is useful for providing a first impression.
--Grid
select ST_UnionAggr("Envelope") from
(
select ST_CLUSTERID() AS "CID",
ST_CLUSTERENVELOPE() AS "Envelope",
COUNT(*) AS "Number of points in this cluster"
from "GOOGLE"."LOC_HISTORY"
GROUP CLUSTER BY "POINT"
USING GRID X CELLS 160 Y CELLS 80
order by 3 desc
);
This time the grid clustering will give us better overview of the geography of recorded locations.
So, it is not like I've traveled half of Asia, but only visited Bengaluru and Istanbul
🙂
Exercise 4: DBSCAN clustering
Density-based spatial clustering of applications with noise (DBSCAN) is best suited to non-spherical clusters. It should works nicely if I want to cluster points along the routs I've been taking within 450 kilometers from my home town Wrocław (the center of the city is roughly 51.11N 17.035E).
--DBSCAN
SELECT st_unionAggr("cluster").st_asWKB() from (
SELECT "cluster_id", st_unionAggr("POINT").ST_AlphaShape(0.055) as "cluster"
from (
SELECT
ST_ClusterID() OVER (CLUSTER BY "POINT" USING DBSCAN EPS 0.001 MINPTS 6) AS "cluster_id",
"POINT"
FROM "GOOGLE"."LOC_HISTORY"
WHERE "POINT".ST_SRID(4326).ST_Distance(NEW ST_Point(17.035, 51.11).st_srid(4326),'kilometer') < 450
ORDER BY 1
)
where "cluster_id" <> 0
group by "cluster_id");
And here is the result.
You can play with different parameters of
DBSCAN clustering and
ST_AlphaShape() to get the best shape for visualization of your data.
Maybe next time I should look at the
My Places dataset from
Google Takeout.
‘Till next
#GeospatialTuesday then,
-Vitaliy, aka
@Sygyzmundovych