Solved: Is a CASE Statement a Read or a Write?

mrwong05 · ‎2020 Sep 16

Hi,

My question is: is a HANA CASE statement considered a READ or a WRITE?

When using 1000s of case statements over 100s of columns in a view, and then trying to consume that view in a Calculation View and then Lumira, the data fails to load, and sometimes crashes the HANA server. If we view the data from the CV directly in HANA the query time is about 40 seconds. If we view the source data table (without the view with CASE statements) the data is retrieved in ~ 1 second.

Would changing from a column store to row store table improve this performance?

Thanks,
Matt

lbreddemann · ‎2020 Sep 16

Ok, I see where you are coming from.

As you can see in the data preview you only read 200 rows - which is why HANA Studio is quick compared to Lumira, where you presumably try and read all data.

It's not the first time I hear about such disappointing experiences.

However, I believe a major misunderstanding here is using HANA as only the ETL part of the analysis.

When the task at hand is creating a huge table only then factors like data transfer capacity between HANA and the client tool (in this case LUMIRA) become increasingly a bottleneck. How large is the final table? How many MB/GB of data need to be transferred and what capacity does the network between HANA and LUMIRA have?

Where you get a major performance benefit is when you do what HANA (and really all analytics DBs) are designed for: perform the actual analysis in the system and retrieve the results.

This is also what LUMIRA and practically all reporting frontends are built for: pushing down the calculations to the analytics system (HANA).

Now, you mentioned that filtering takes a lot of time when running against the CASE expressions. That's easy to believe as HANA has to compute the expression for all values in the respective column which involves touching the whole column.

An alternative modelling approach for replacing lookup values (e.g. using lookup tables and joins) could speed this up considerably.
Whether this would be the optimal approach for your case is something that would require more analysis (and I'm happy to be hired 🙂 ) but there are certainly options to improve on the current situation.

Finally, the difference in performance when reading the complete rows from rowstore and columnstore is relatively easy to explain: the rowstore keeps the full rows with all values in memory at all times. The column store compresses and splits the data into individual columns and needs to re-establish the complete rows at query time.
This full-row-no-filter-query is basically the anti-use case for column store processing.
The downside here is that the rowstore just allocates immense amounts of memory for this - memory that is not available to anything else. And HANA was designed to allow transaction processing (of the ERP system) and analytics (not data dumping) from the same DB.

Anyhow, that's my view on it based on the last 10 years working with the technology in various industries including cancer research.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Is a CASE Statement a Read or a Write?

Know the answer?

Need more details?