Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member319447
Participant
1,281
There are several articles in this blog space dealing with Table Pivoting, A recent article, universally titled Pivoting Data with SAP HANA, demonstrates a Python programming interface which uses Data Frames and the hana_ml machine learning library. However, this is not a method which can be used in the context of the SQL language, and therefore it is very limited in scope.

There are two other universally titled articles PIVOT Data in HANA and How to pivot dataset in SAP HANA which give an SQL solution, but they seem either too wordy or too unscientific for something that I think is really just a simple math problem.

There are also several older, archived articles which I feel are much too vague or overly procedural to be universally titled as the all-encompassing solution.

I recently ran across a table pivoting problem in my own application, and needed a solution in the form of an SQL select statement. If you are part of the older engineering crowd, then you may have run across this book which was published 25 years ago:



The authors devoted the entire book to a collection of essays on SQL programming techniques, many of which have been lost to history. This book presents an innovative methodology for SQL programming based on the use of Point Characteristic Functions.

Point characteristic functions are devices that allow engineers to encode conditional logic as scalar expressions within select, where, group by and set clauses. These techniques implement single statement solutions to problems that normally require multiple SQL statements. They also minimize the number of passes through underlying tables, and use SQL to obtain results not otherwise available within a relational implementation.

With an understanding of the old SQL-92 mechanics described in the book, I re-implemented their point characteristic functions using features available in the newer SQL used by HANA.

The secret to turning, or pivoting, long narrow tables into short, wide ones, lies in the proper encoding and use of these point characteristic functions. There are a wide variety of solutions enabled by these and other characteristic functions. The need for such pivoting occurs frequently in practice, primarily because, while the narrow table representation is better for data manipulation, the wide table form can be better for presenting data to reports.

We will get right to the problem I faced, and then explain the details. The application takes a set of Bank Deposit Rates for a number of months, stored in a long narrow table called DepositRates, and pivots the data to present the rate as columns by month:

Here is the input data


 

create table DepositRates (
id int not null,
month int not null,
rate decimal(6,6) null,
resetDate date null,
resetNote nvarchar(50) null);

create table Instruments (
id int not null,
name nvarchar(8) not null,
source nvarchar(8) not null);

 

insert into DepositRates values(1,1,0.008000,to_date('10/02/2019','MM/DD/YYYY'),'late reset');
insert into DepositRates values(1,2,0.025500,to_date('11/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(1,3,0.014100,to_date('11/29/2019','MM/DD/YYYY'),'early reset');
insert into DepositRates values(2,1,0.007115,to_date('10/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(2,2,0.007159,to_date('11/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(2,3,0.007135,to_date('12/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(3,1,0.022441,to_date('10/01/2019','MM/DD/YYYY'),NULL);
insert into DepositRates values(3,2,0.020109,NULL,'missing reset');
insert into DepositRates values(3,3,0.017743,to_date('12/01/2019','MM/DD/YYYY'),NULL);

insert into Instruments values(1,'HKDOND=','Reuters');
insert into Instruments values(2,'GBP3MD=','Euribor');
insert into Instruments values(3,'USD1MD=','Reuters');

 

select * from Instruments;

ID NAME SOURCE
-- ------- --------
1 HKCOMD= Reuters
2 GBP3MD= Euribor
3 USD1MD= Reuters

 

select * from DepositRates;

ID MONTH RATE RESETDATE RESETNOTE
-- ----- -------- ---------- ----------
1 1 0.008000 2019-10-02 late reset
1 2 0.025500 2019-11-01 NULL
1 3 0.014100 2019-11-29 early reset
2 1 0.007115 2019-10-01 NULL
2 2 0.007159 2019-11-01 NULL
2 3 0.007135 2019-12-01 NULL
3 1 0.022441 2019-10-01 NULL
3 2 0.020109 NULL missing reset
3 3 0.017743 2019-12-01 NULL

 

Here is the pivot query


 

select
i.name "name",
i.source "source",
max(r.rate*(nullif((1-abs(sign(r.month-1))),0))) "current rate",
max(r.rate*(nullif((1-abs(sign(r.month-2))),0))) "period-1 rate",
max(r.rate*(nullif((1-abs(sign(r.month-3))),0))) "period-2 rate"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 

And here is the result


 

name source current rate period-1 rate period-2 rate
------- ------- ------------ ------------- -------------
GBP3MD= Euribor 0.007115 0.007159 0.007135
HKDOND= Reuters 0.008000 0.025500 0.014100
USD1MD= Reuters 0.022441 0.020109 0.017743

 

The mechanics


As you can see, we have pivoted the table by flipping the rates stored as rows for each instrument, into columns. We pivot by a numeric month and present a numeric rate. Note that the expression (nullif((1-abs(sign(month-1))),0)) acts as a point characteristic function for month = 1. Specifically, it returns 1 if the month is equal to 1 and returns NULL otherwise. The function sign() returns -1,0,+1 for negative numbers, zero and positive numbers respectively; the function abs() returns the absolute value of its argument; and the function nullif() returns NULL if the value is 0. Consequently, only one, the first, of the three rate values participating in the max() aggregate retains its original value, with the other two being reduced to NULL. Thus, the expression max(rate*(nullif((1-abs(sign(month-1))),0))) in effect simply returns the month 1 rate amount, precisely as required. The expression max(rate*(nullif((1-abs(sign(month-2))),0))) acts in a similar way, implementing a characteristic function for month 2.

Breaking up the query, we can see the return value for the point characteristic function as follows:

select
id,
nullif((1-abs(sign(month-1))),0) as v1,
nullif((1-abs(sign(month-2))),0) as v2,
nullif((1-abs(sign(month-3))),0) as v3
from DepositRates
order by id,month;

 

ID V1 V2 V3
-- ---- ---- ----
1 1 NULL NULL
1 NULL 1 NULL
1 NULL NULL 1
2 1 NULL NULL
2 NULL 1 NULL
2 NULL NULL 1
3 1 NULL NULL
3 NULL 1 NULL
3 NULL NULL 1

Then, we apply the function to the presentation attribute called rate by multiplying it by the function result. Here is the result:

select
id,
rate*nullif((1-abs(sign(month-1))),0) as v1,
rate*nullif((1-abs(sign(month-2))),0) as v2,
rate*nullif((1-abs(sign(month-3))),0) as v3
from DepositRates
order by id,month;

 

ID V1 V2 V3
-- -------- -------- --------
1 0.008000 NULL NULL
1 NULL 0.025500 NULL
1 NULL NULL 0.014100
2 0.007115 NULL NULL
2 NULL 0.007159 NULL
2 NULL NULL 0.007135
3 0.022441 NULL NULL
3 NULL 0.020109 NULL
3 NULL NULL 0.017743

When max() and group by are applied, you can probably see that the final aggregate result is returned as expected.

Other variations


In the above example, we pivoted by a numeric data type and likewise presented a numeric data type. If you imagine that other combinations are possible, your observation is correct. We can pivot and present by any combination of numeric, date or character data types.

Here is a date data type presentation of resetDate. The add_days() function is applied to a point characteristic function that returns either 1, which keeps resetDate, or NULL, which discards resetDate. We rely on the behavior of add_days() to return NULL if the date adder is NULL. I used add_days() but could have used some other date function as well:

select
i.name "name",
i.source "source",
max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-1))),0))) "current reset",
max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-2))),0))) "period-1 reset",
max(add_days(r.resetDate,1-nullif((1-abs(sign(r.month-3))),0))) "period-2 reset"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 

name source current reset period-1 reset period-2 reset
------- ------- ------------- -------------- --------------
GBP3MD= Euribor 2019-10-01 2019-11-01 2019-12-01
HKDOND= Reuters 2019-10-02 2019-11-01 2019-11-29
USD1MD= Reuters 2019-10-01 NULL 2019-12-01

Here is a character data type presentation of resetNote which relies on the behavior of substring() to return NULL if the starting position is NULL:

select
i.name "name",
i.source "source",
max(substring(r.resetNote,nullif((1-abs(sign(r.month-1))),0),length(r.resetNote))) "current note",
max(substring(r.resetNote,nullif((1-abs(sign(r.month-2))),0),length(r.resetNote))) "period-1 note",
max(substring(r.resetNote,nullif((1-abs(sign(r.month-3))),0),length(r.resetNote))) "period-2 note"
from DepositRates r, Instruments i
where r.id = i.id
group by i.name,i.source
order by i.name;

 

name source current note period-1 note period-2 note
------- ------- ------------ ------------- -------------
GBP3MD= Euribor NULL NULL NULL
HKDOND= Reuters late reset NULL early reset
USD1MD= Reuters NULL missing reset NULL

The code also works properly when presented with missing data rows. If we assume that the row for instrument 1 for month 1 is missing, the max() aggregate corresponding to month 1 for instrument 1 would be comprised of two components, all of which, not belonging to month 1, would be NULL. The max() aggregate would then return NULL as the result, exactly as desired. The code also works properly if any values themselves are NULL, in exactly the same manner.

Folding, which is the inverse of pivoting, is also described in the book as an application of point characteristic functions. For now though, the immediate problem which I faced is now solved. If you have an interest in this subject, then I highly recommend finding a copy of the book.

Cheers.
Labels in this area