Comparing the BI Accelerator (f.k.a. HPA) to Traditional RDBMS Technology
NW 2004s comprises a new BI accelerator (BIA), formerly known as High Performance Analytics (HPA). This new functionality allows to boost the performance of multi-dimensional queries (of NW2004s BI, i.e. BW) by factors rather than just margins. At the same time, it reduces the administrator's efforts to achieve such outrageous performance to a minimum, especially in comparison to traditional performance-enhancing methods like precomputed structure such as database indexes or aggregates (a.k.a. materialized views or summary tables). For a more detailed description on the BI accelerator refer to an article published in the October 2005 issue of SAP Insider.
In this weblog, I'd like to quickly touch a question that many might have in mind when looking at this new technology: why the heck is this so much faster in comparison to traditional RDBMS products? Can't the latter be similarly fast if there is
- sufficient main memory,
- parallelism,
- and maybe even column-oriented and compressed structures like bitmap indexes?
These three are often cited to be among the major sources of performance used within the BI accelerator. There are more than just those three - like the absence of rollback and logging mechanisms or processing technology tailored towards NW BI's star schema - but, below, I will focus on those three.
Well, in principal, traditional RDBMS have a certain amount of technology in place that - in the mid- to long-term - could be evolved into something similar as the BIA has to offer. Let us simply consider a bitmap index. The latter can be regarded as a compressed representation of a table column (in theory, a bitmap index can cover more than one column but, in reality, this makes hardly sense). Its predominant advantage is that it can be easily combined with other bitmap indexes (e.g. on the same table) through bit operations. So, theoretically, a table or parts of it can be completely represented by a set of bitmaps. Bitmaps can be highly compressed and should thus fit into main memory. Data parallelism in this context is also straightforward.
So: what hinders an RDBMS to get there? Take a typical star join approach in which bitmaps are used to propagate the selectivity to the fact table that originates in the filters that are defined on a number of dimensions. Imagine filters YEAR=2005 and CITY='London'. Each filter in itself has a lower selectivity than the combination of the filters. In this example, it is likely that two bitmaps, one for each filter, would be merged to create the list of rows in the fact table that need to be aggregated. At exactly this moment, a switch from a column-oriented towards a row-oriented approach takes place. Subsequently, processing is row-oriented with all its overhead due to unnecessary I/O: just consider a fact table with 20 key figures (measures) out of which only 2 are relevant to the actual query. In the end, the values of all 20 key figures are transported from disk to main memory and from there towards the CPU. Only there, 18 out of the 20 will be discarded. The same argument applies to dimension columns that are not relevant to the actual query.
The above describes only one amongst various situations where the traditional RDBMS approach falls behind. This doesn't mean that they are all that bad. They are simply so much more generic that the BI accelerator can ever be. In other words: the BIA constitutes a highly specialized approach targeting multi-dimensional query processing which is predominant in BI systems and which is therefore worthwhile even at the expense of loosing some qualities. They are simply not required in that context.