As we know, SAP HANA introduced window function since SPS05 Rev. 45. You can find this info from SAP HANA SQL (Changed) - What's New in the SAP HANA Platform (Release Notes) - SAP Library, section "New SQL Statements and Functions" as follows.
"Windows Functions (new)RANK, DENSE_RANK, ROW_NUMBER, LEAD, LAG: Divide result sets of a query into groups of rows called window partitions."
So, in this blog I want to share something about window function in SAP HANA with you. Since Window Functions - SAP HANA SQL and System Views Reference - SAP Library has already showed the syntax, the description and examples of window function in detail, I won't explain more about the basics of window function here. However, I'll focus on the comparison of window function and self-join in SAP HANA to show you why we need window function in SAP HANA.
Recently I reviewed SAP HANA SQL and System Views Reference - SAP Library and when I read the section of window function, an question came to my mind. Can we implement the window function feature without using window function itself? Or in other words, if there is no window function in SAP HANA, how can we implement ROW_NUMBER, RANK, DENSE_RANK, etc.. these functions? Is it possible? :???: Now let's have a discussion/brain storming together.
Consider example 1 from Window Functions - SAP HANA SQL and System Views Reference - SAP Library. I just pasted the window dump as below.
Without window function in SAP HANA, can we implement ROW_NUMBER, RANK and DENSE_RANK and get the same results? It took a while for me to think about this question. Although I did not give it a shot, theoretically it should be possible if we use Cursors or ARRAY in SQLScript or we can use some methods described in SAP HANA: Dynamic Ranking using Script based (SQL Script) Vs Graphical Calculation view Vs Script ba.... However, all of these methods are a little bit complicated, since we need the knowledge of SQLScript and modeling which are not known by most new to SAP HANA. So an idea came to my mind, it's better if we can directly use SQL to implement this. So is it possible to just use one SQL statement to implement ROW_NUMBER, RANK and DENSE_RANK respectively?
In order to answer this question, first of all we need to clarify one thing. Why we need window function and what's the advantage of window function? Maybe you can point lots of reasons. Certainly you can. Here I just want to point out one reason - avoid self-join, because we can use self-join to implement most window functions. Believe it or not. Now let's give it a shot.
For simplicity, I created a more simple scenario/table as follows and we just focus on COUNT, AVG, ROW_NUMBER, RANK and DENSE_RANK these five window functions.
CREATE COLUMN TABLE S (
CLASS VARCHAR(10),
VAL INTEGER
);
INSERT INTO S VALUES ('A', 1);
INSERT INTO S VALUES ('A', 1);
INSERT INTO S VALUES ('A', 1);
INSERT INTO S VALUES ('A', 2);
INSERT INTO S VALUES ('A', 2);
INSERT INTO S VALUES ('A', 3);
INSERT INTO S VALUES ('B', 4);
As you can see, we ignore the OFFSET column in example 1 above, since COUNT, AVG, ROW_NUMBER, RANK and DENSE_RANK do not need this column. Identical to example 1, we want to partition by CLASS and order by VAL to do some ranking stuff. In addition, I'll also show COUNT and AVG.
Window function
SELECT CLASS, COUNT(1) OVER (PARTITION BY CLASS) CNT FROM S ORDER BY CLASS;
Self-join
SELECT A.CLASS, B.CNT FROM S A INNER JOIN
(SELECT CLASS, COUNT(1) CNT FROM S GROUP BY CLASS) B
ON A.CLASS = B.CLASS ORDER BY A.CLASS;
The logic is simple, first you create B which calculates COUNT of each class, then you make a join.
Window function
SELECT CLASS, AVG(VAL) OVER (PARTITION BY CLASS) AVG FROM S ORDER BY CLASS;
Self-join
SELECT A.CLASS, B.AVG FROM S A INNER JOIN
(SELECT CLASS, AVG(VAL) AVG FROM S GROUP BY CLASS) B
ON A.CLASS = B.CLASS ORDER BY A.CLASS;
The logic is similar with the example of COUNT.
Window function
SELECT CLASS, VAL, ROW_NUMBER() OVER (PARTITION BY CLASS ORDER BY VAL) ROW_NUM FROM S ORDER BY CLASS, ROW_NUM;
Self-join
SELECT A.CLASS, A.VAL, COUNT(1) ROW_NUM FROM S A INNER JOIN S B ON
A.CLASS = B.CLASS AND (A.VAL > B.VAL OR (A.VAL = B.VAL AND A."$rowid$" >= B."$rowid$"))
GROUP BY A.CLASS, A.VAL, A."$rowid$" ORDER BY CLASS, ROW_NUM;
First of all, you can see "$rowid$" is used in this example. You can find what it is from Do we have a ROWID or equivalent in HANA ? | SAP HANA. Actually "$rowid$" is undocumented and only available with column table. That's why I created column table instead of row table.
Regarding the logic, the ROW_NUM of each row is determined by "# of VAL which is less than mine" plus "# of VAL which is equal to mine and its row id is less than or equal to mine". Hope you can understand the logic. You may ask why I have to use the undocumented "$rowid$". Because S has no primary key, we need to have a "logical" primary key instead.
Window function
SELECT CLASS, VAL, RANK() OVER (PARTITION BY CLASS ORDER BY VAL) RANK FROM S ORDER BY CLASS, RANK;
Self-join
SELECT A.CLASS, A.VAL, COUNT(1) RANK FROM S A INNER JOIN S B ON
A.CLASS = B.CLASS AND (A.VAL > B.VAL OR (A.VAL = B.VAL AND A."$rowid$" = B."$rowid$"))
GROUP BY A.CLASS, A.VAL, A."$rowid$" ORDER BY CLASS, RANK;
The logic is similar with ROW_NUMBER. Hope you can understand it.
Window function
SELECT CLASS, VAL, DENSE_RANK() OVER (PARTITION BY CLASS ORDER BY VAL) DENSE_RANK
FROM S ORDER BY CLASS, DENSE_RANK;
Self-join
SELECT A.CLASS, A.VAL, COUNT(1) DENSE_RANK FROM S A
INNER JOIN (SELECT CLASS, VAL FROM S GROUP BY CLASS, VAL) B ON
A.CLASS = B.CLASS AND A.VAL >= B.VAL
GROUP BY A.CLASS, A.VAL, A."$rowid$" ORDER BY CLASS, DENSE_RANK;
As you can see, the logic is more simple than ROW_NUMBER and RANK because of the density.
Since we can implement most window functions with self-join, you may be curious about the comparison of performance between window function and self-join in SAP HANA. Therefore, I've also made some performance tests for you.
Test environment:
Data set:
Function | Window function (ms) | Self-join (ms) | Ratio |
---|---|---|---|
COUNT | ~80 | ~80 | ~1X |
AVG | ~90 | ~90 | ~1X |
ROW_NUMBER | ~100 | ~2500 | ~25X |
RANK | ~100 | ~2500 | ~25X |
DENSE_RANK | ~100 | ~1500 | ~15X |
We can do some analysis from the above table.
- COUNT, AVG: You can find the performance of window function and self-join seems the same. It's true because the join condition only contains "=" which means only hash-join happened.
- ROW_NUMBER, RANK: Window function has much better performance (25X) than self-join, since the self-join contains non equi join.
- DENSE_RANK: B is kept smaller than S itself and the logic is more simple, so the ratio (15X) is smaller than ROW_NUMBER and RANK.
In this blog, we found a simple way (self-join in SQL) to implement the window function in SAP HANA. We implemented COUNT, AVG, ROW_NUMBER, RANK and DENSE_RANK with the self-join approach. And we compared the performance between window function and self-join in the end. For better performance, we should always use native window function in SAP HANA instead of self-join. On the other hand, if self-join exists in your SQL statement, please think about if it is possible to implement the logic with window function.
If you have any better idea how to implement the window function in SAP HANA, e.g., implement my scenario without using "$rowid$", please feel free to let me know. Look forward to your feedback on this topic.
Hope you enjoyed reading my blog. :smile:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
23 | |
14 | |
11 | |
10 | |
9 | |
8 | |
8 | |
7 | |
7 | |
7 |