cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Is a CASE Statement a Read or a Write?

mrwong05
Participant
0 Likes
1,382

Hi,

My question is: is a HANA CASE statement considered a READ or a WRITE?

When using 1000s of case statements over 100s of columns in a view, and then trying to consume that view in a Calculation View and then Lumira, the data fails to load, and sometimes crashes the HANA server. If we view the data from the CV directly in HANA the query time is about 40 seconds. If we view the source data table (without the view with CASE statements) the data is retrieved in ~ 1 second.

Would changing from a column store to row store table improve this performance?

Thanks,
Matt

View Entire Topic
lbreddemann
Active Contributor

Ok, I see where you are coming from.

As you can see in the data preview you only read 200 rows - which is why HANA Studio is quick compared to Lumira, where you presumably try and read all data.

It's not the first time I hear about such disappointing experiences.

However, I believe a major misunderstanding here is using HANA as only the ETL part of the analysis.

When the task at hand is creating a huge table only then factors like data transfer capacity between HANA and the client tool (in this case LUMIRA) become increasingly a bottleneck. How large is the final table? How many MB/GB of data need to be transferred and what capacity does the network between HANA and LUMIRA have?

Where you get a major performance benefit is when you do what HANA (and really all analytics DBs) are designed for: perform the actual analysis in the system and retrieve the results.

This is also what LUMIRA and practically all reporting frontends are built for: pushing down the calculations to the analytics system (HANA).

Now, you mentioned that filtering takes a lot of time when running against the CASE expressions. That's easy to believe as HANA has to compute the expression for all values in the respective column which involves touching the whole column.

An alternative modelling approach for replacing lookup values (e.g. using lookup tables and joins) could speed this up considerably.
Whether this would be the optimal approach for your case is something that would require more analysis (and I'm happy to be hired 🙂 ) but there are certainly options to improve on the current situation.

Finally, the difference in performance when reading the complete rows from rowstore and columnstore is relatively easy to explain: the rowstore keeps the full rows with all values in memory at all times. The column store compresses and splits the data into individual columns and needs to re-establish the complete rows at query time.
This full-row-no-filter-query is basically the anti-use case for column store processing.
The downside here is that the rowstore just allocates immense amounts of memory for this - memory that is not available to anything else. And HANA was designed to allow transaction processing (of the ERP system) and analytics (not data dumping) from the same DB.

Anyhow, that's my view on it based on the last 10 years working with the technology in various industries including cancer research.

mrwong05
Participant
0 Likes

Agree with you on all points except that we even when our query is only doing an aggregation over a dimension the returned results are 10x slower than using hana studio analysis tab (we wish we could give this slice and dice interface to the user, but we can't so lumira is the next best thing).

Honestly our hardware is top of the line but we still face bottlenecks,

Hana 1TB certified aws x1.16xLarge, Lumira 128 GB r4.8xLarge, and same region AWS backbone.

I'm a believer in column store because of you and all your replies on this forum.