A recent discussion in the SAP internal JAM community for HANA made me aware of a feature added to HANA SQL starting with HANA 1 SPS 12 (don't ask the revision, just check the latest), respectively HANA 2 SPS 02 (we don't speak about SPS 00 and SPS 01, do we?):
"calculated columns".
Generated <> Calculated
Like many other functions, this feature seems to have been implemented to
- tick the box for SQL standard compliance/competition feature equality
and
- to provide Core Data Services (CDS) with the SQL implementation for a functionality (all HANA CDS functions are mapped to SQL features in some way, which means that it is not possible to express anything more in CDS than in SQL)
Initially, I thought "calculated columns? We had that for some time!" and associated the "
GENERATED ALWAYS AS" (GC) columns but this is not what "calculated columns" (CC from here on) are. CCs do not store any values but values get computed at query time.
Look, others have it too!
They are the corresponding feature to
Oracle's "virtual column" and
MS SQL Server's "Computed Column".
Or, to be more precise, both HANA features CC and GC combined map to the respective features of the other DBMS since these provide the option to persist computed values as a command option (makes sense to me). As mentioned above, the SQL Standard (2003) contains computed columns as an optional feature, which is probably why this obscure functionality is present in all those DBMS products.
What is it for?
While I understand the usage scenarios for GC where the result of the computation gets stored - pre-computing something is about performance and reuse later on - I am not too sure about the use case for CC's.
The SAP colleague in the mentioned JAM discussion was quite fond of them as the allow to specify computed columns without using views. That way, common data transformations that can be built with just the columns of one table, can be defined very close to the table itself.
This sounds like a good thing.
Now, I have a different perspective on that.
No surprises are good surprises
Let's say I run a plain old
SELECT * FROM user_info;
my expectation is that HANA provides me with the currently valid contents of the visible table columns.
I definitively do not expect to find any computation in the explain plan. But with expression based columns this is exactly what I get:
Of course, this computation adds runtime and memory usage to the statement processing and will change the query plan in more complicated scenarios.
The same surprise will show up when I try to index the column:
create index i_address on user_info (email_address);
Could not execute 'create index i_address on user_info (email_address)'
SAP DBTech JDBC: [7]: feature not supported:
cannot support create index on calculated field: EMAIL_ADDRESS: line 1 col 38 (at pos 37)
or when I try to partition based on this column
alter table user_info partition by HASH (email_address) PARTITIONS 4;
Could not execute 'alter table user_info partition by HASH (email_address) PARTITIONS 4' in 39 ms 39 µs .
SAP DBTech JDBC: [2048]: column store error:
fail to alter partition: [2051] partition specification not valid;
The attribute EMAIL_ADDRESS is not allowed in a partition specification.
Both indexing and partitioning are the main use cases for the CC feature in Oracle and MS SQL Server.
HANA does not support these use cases, so what's left really is not having to create a view as the main benefit.
Beginning of problems
Obviously one could argue: well, the expectation that HANA should not be computing expressions when I query a base table is just wrong.
As of SQL 2003 (since 14 years!), it was not correct anymore.
So, go change your expectation!
Fair call, but most users don't have that expectation.
The notion of the "dumb" base table is too strong that things like CCs and triggers are easily overlooked.
And that is where the problem starts: when the expectation about the behavior is wrong this usually means mistakes, wrong decisions and wasted time and effort.
The poor person who is confronted with a table using CCs later on (might even be the same developer who simply forgot that this table contains CCs) cannot tell from "looking" at the table which columns are real "base" columns and which ones are computed.
It does not show in the table definition UI and it does not show in
SELECT TOP 5 * from <table>.
Only when this person digs deeper and checks the CDS source code or the HANA system tables (
TABLE_COLUMNS) and really looks for this setting, this piece of information can be seen.
select
column_name, data_type_name
, index_type
, generated_always_as, generation_type
from
"TABLE_COLUMNS"
where
table_name='USER_INFO'
and schema_name=current_schema
order by position asc ;
I do not like this kind of surprises and rather have a base table plus a consumption view that contains the necessary expressions and transformations.
Not only can I create more complex expressions in such a consumption view but I also have a better chance to understand where data is stored and where data is computed.
Pulling the rug?
The only other use case for CCs that came to my mind is changing data models on DB level, while the application(s) remain unchanged.
Such changes could only be made for applications that read from the table. As soon as somebody tries to change the column this leads to an error.
So, you see, I really struggle with finding a good reason for using this feature.
What about you?
Have you come across a scenario where this was/would have been the absolute life saver? When and where would you use it and why?
Please share your experience with CCs or virtual/columns in the comments section.
There you go, now you now!
Cheers,
Lars
p.s. I created my
user_info table based on a table with made up email addresses:
create column table user_info as (
SELECT id,
substr_before (emailaddress, '@') user_part,
substr_after (emailaddress, '@') domain_part
FROM email_adr);
Adding the computed column is straight forward:
alter table user_info
add (email_address NVARCHAR(320) as
user_part ||'@'|| domain_part);
Note: only HANA
expressions are allowed in the computation part. No function calls or complex SQL statements.