on ‎2005 Apr 22 6:54 PM
We've created an aggregate and have tested some queries on it. Most queries run better. But one query selects 360,000 database records w/o aggregate and selects 537,000 database records with the aggregate. It also takes more time on the database.
I thought that in the worst case scenario an aggregate would return the same results. So I was surprised to see this result. Does anyone know of a scenario where the aggregate could actually perform worse then the cube? Any ideas of where I could look next?
Thanks,
Chris
Request clarification before answering.
Is your Aggregate compressed or uncompressed ?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Ravi,
Your question made me look at database size to make sure I answered correctly. My aggregate has 311978 records in it (1.84 mil in cube.) Yet if I read the staticis correctly there is 537095 database reads. That doesn't make sense to me.
I must be misundertstanding the statics.
a. Is field QDBSEL in table RSDDSTAT the number of rows from the cube/aggregate that had to be read inorder to retreive the correct results.
b. Does the value in QDBSEL (537095 in this case) represent cube/aggregate records read. Or can it represent records read from dim tables and master data tables?
Chris
Have you verified your row counts by using SE16? It actually runs a query and gets true current row counts in your aggregate and cubes. If you are looking at something else you might be getting counts based on the last time statistics were collected which could be recent or quite some time ago depending on exactly how your shop is handling statistics collection.
You can collect the fresh statistics a couple of ways, but the easiest is using the Performance tab.
The QDBSEL should be the rows read from the fact table. How does it compare to the QDBTRANS, the rows transferred to the OLAP processor?
As to how you could be having record counts greater against the aggregate than against the base cube, I can't explain that. I have never seen aggregate reads higher than off the base cube scenario. I'm assuming you are running the query thru RSRT twice, once with the aggregate selection turned off and once with it on.
Have you looked at an explain plan of the two executions? Are they the same except for the reference to the base and aggreagte fact tables.
What database are you using?
It is entirely possible to have a query against an aggregate deliver poorer performance than the base cube.
The main reason would be if you do not have much of a degree of summarization between the base cube and the aggregate.
Another reason that I can think of is if you have multiple aggregates and a query that has muliple restricted key figures resulting in muliple "split parts" running against several of the aggregates vs a single execution against the base cube, or a sinlge aggregate. Doubt that's the case here.
Are you working with inventory or some other application tha requires markers in the aggregate?
Reply to the various questions posted by Pizzaman.
Based on your advice I've verified row counts in SE16. I get the same result. (Cube 1.84 mil & aggregate .31 mil rows.)
<b>Aggregate:</b>
QDBSEL = 537095
QDBTRANS = 21904
<b>Cube:</b>
QDBSEL = 360626
QDBTRANS = 15052
Yes, I'm using RSRT twice, once with the aggregate selection turned off and once with it on.
I've not been able to look at the explain plan. Any help on how to do this? I've tried to use RSRT option to show the SQL but I get an error message. I need to figure out why.
We are running on DB2 UDB.
We only have one aggregate at the time. But we do have 12 restricted key figures in the query.
No markers in the aggregate.
Thanks,
<i>Chris</i>
Since you're running DB2, I can't guarantee how if/how well the screens I mention will match since I'm in an Oracle environment. But we'll try....
You can use ST04 to get to the SQL Stmst in that have run recently. SM50 might show you the SQL while it's running, but it might truncate what it displays. ST04 is a good tuning tool to use to examine high resource using queries. You can filter on the fact/aggregate tables to narrow the stmts you'll see.
With respect to RSRT, I check the box called Show Run Schedule, NOT the Show SQL stmt. The Show SQL displays the generic SQL BW generates that must then be translated to the target DB - which is what you'll see in the Show Run Schedule.
So if you select Display Aggregate and the Show Run Schedule, what I think you'll see is what SAP BW refers to as "Split Parts" of the query, and you'll see several 1, 2, 3.... to 12 , or more since you have so many RKFs. Each Split Part will show you whether it was able to use the aggregate, or if it chose the base cube.
Somewhere in the bowels of the BW beast when a query is submitted, the system looks at a query and decides if the query can be satisfied by running against the aggregate rather than against the base cube. To do this it has to examine the query to see whether everything you are looking for is in the aggregate, if not, it will run against the base cube.
Seems simple enough, but (big BUT) what happens if you have multiple RKFs in your query? BW is going to examine each RKF and a generate a query for all the ones it can satisfy from the aggregate and and another separate query for all the RFKs it must get from the base cube. That's what I think is going on in your case. So now you have two queries running. This can get even more interesting if you have multiple aggregates, in which case, depending on the RKFs, it might use multiple aggregates and the base cube.
So now you have multiple queries ( from what I have seen, they run sequentially rather than in parallel) and they are all reading dimension, master data, as well as fact/aggregate tbls. These multiple result sets must be merged by the OLAP processor. This is how you can end up with a query that uses aggregates that can run longer than the query against the base cube and have to perform more DB selects.
Pizzaman,
Thank you. You've answered my question and also provided me with a lot to chew on.
I ran the RSRT with option Display Aggregate. And noticed that there was a 1 & 2. 1 was on the agg and 2 on the cube. I was able to compare them both and determine the missing value. I added the value to the aggregate and tried again. This time the aggregate performs better.
I'm not crazy about this query split. That means that if I don't create my aggregates correctly, I could cause more problems then I help.
ST04 was a little overwhelming, but I can see where this will be a valuable tool once I get use to it.
Thanks,
Chris
I've never seen the split part stuff discussed any where. Aggregates are always touted as the thing to do and people go off blindly creating them.
When I first started trying to tune queries against a cube, I created 5 (what I thought were carefully crafted)aggregates (some were parent/child) and thought I had deserved the Nobel prize. Queries were running faster, but no where near the improvement I had expected. As I started to look into the mess like you have been doing, I found that some of the user queries were invoking all 5 of the aggregates because of the different RKFs we had.
So I finally scaled back to just one slightly larger aggregate and got rid of the others and got another very nice performance gain. Simpler really can be better sometimes.
So aggregates really can be a little more involved than they are made out to be.
Glad to help. Have fun with this stuff.......
| User | Count |
|---|---|
| 8 | |
| 8 | |
| 7 | |
| 4 | |
| 4 | |
| 2 | |
| 2 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.