‎2010 Oct 07 7:45 PM
With the indexes, the program runs more slowly, much more slowly than when the indexes aren't there. However, the explain function to me indicates that with the indexes, it should be much better.
The explain when the indexes are present is:
SQL Statement
------------------------------------------------------------------------
SELECT
T_01 . "MANDT" , T_01 . "VBELN" , T_01 . "VFDAT" , T_02 . "BUKRS" ,
T_00 . "VKORG" , T_02 . "VKORG" , T_00 . "VBELN" , T_03 . "BUKRS"
FROM
"LIPS" T_01 INNER JOIN "LIKP" T_00 ON T_01 . "MANDT" = :A0 AND T_00 . "VBELN" = T_01 . "VBELN"
INNER JOIN "TVKO" T_02 ON T_02 . "MANDT" = :A1 AND T_00 . "VKORG" = T_02 . "VKORG" INNER JOIN "T9F
ATLAS_CCO" T_03 ON T_03 . "MANDT" = :A2 AND T_02 . "BUKRS" = T_03 . "BUKRS"
WHERE
T_00 . "MANDT" = :A3 AND ( T_01 . "ERDAT" <= :A4 OR T_01 . "AEDAT" <= :A5 ) AND ( ( T_01 .
"ERDAT" >= :A6 OR T_01 . "AEDAT" <= :A7 ) AND ( ( T_01 . "ERDAT" <= :A8
OR T_01 . "AEDAT" >= :A9 ) AND ( T_01 . "ERDAT" >= :A10 OR T_01 .
"AEDAT" >= :A11 ) ) )
Execution Plan
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4632 | 1248K| 73711 (3)|
| 1 | CONCATENATION | | | | |
| 2 | NESTED LOOPS | | 2375 | 640K| 41855 (3)|
|* 3 | HASH JOIN | | 2375 | 619K| 41854 (3)|
|* 4 | TABLE ACCESS FULL | TVKO | 59 | 826 | 2 (0)|
| 5 | NESTED LOOPS | | 2558 | 632K| 41851 (3)|
|* 6 | TABLE ACCESS BY INDEX ROWID| LIPS | 2558 | 582K| 40814 (3)|
|* 7 | INDEX RANGE SCAN | LIPS~Z04 | 496K| | 492 (7)|
| 8 | TABLE ACCESS BY INDEX ROWID| LIKP | 1 | 20 | 0 (0)|
|* 9 | INDEX UNIQUE SCAN | LIKP~0 | 1 | | 0 (0)|
|* 10 | INDEX UNIQUE SCAN | T9FATLAS_CCO~0 | 1 | 9 | 0 (0)|
| 11 | NESTED LOOPS | | 2257 | 608K| 31856 (4)|
|* 12 | HASH JOIN | | 2257 | 588K| 31856 (4)|
|* 13 | TABLE ACCESS FULL | TVKO | 59 | 826 | 2 (0)|
| 14 | NESTED LOOPS | | 2430 | 600K| 31853 (4)|
|* 15 | TABLE ACCESS BY INDEX ROWID| LIPS | 2430 | 552K| 30868 (4)|
|* 16 | INDEX RANGE SCAN | LIPS~Z03 | 496K| | 495 (7)|
| 17 | TABLE ACCESS BY INDEX ROWID| LIKP | 1 | 20 | 0 (0)|
|* 18 | INDEX UNIQUE SCAN | LIKP~0 | 1 | | 0 (0)|
|* 19 | INDEX UNIQUE SCAN | T9FATLAS_CCO~0 | 1 | 9 | 0 (0)|
-----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T_00"."VKORG"="T_02"."VKORG")
4 - filter("T_02"."MANDT"=:A1)
6 - filter(("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7) AND
("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A10 OR
"T_01"."AEDAT">=:A11))
7 - access("T_01"."MANDT"=:A0 AND "T_01"."AEDAT"<=:A5)
9 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")
10 - access("T_03"."MANDT"=:A2 AND "T_02"."BUKRS"="T_03"."BUKRS")
12 - access("T_00"."VKORG"="T_02"."VKORG")
13 - filter("T_02"."MANDT"=:A1)
15 - filter(("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11) AND
("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A6 OR
"T_01"."AEDAT"<=:A7) AND LNNVL("T_01"."AEDAT"<=:A5))
16 - access("T_01"."MANDT"=:A0 AND "T_01"."ERDAT"<=:A4)
18 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")
19 - access("T_03"."MANDT"=:A2 AND "T_02"."BUKRS"="T_03"."BUKRS")
‎2010 Oct 11 7:16 PM
Hi Matt,
would you mind to give the information:
1) how many blocks for table LIPS
2) how many rows for table LIPS
3) how many rows for LIPS and your selection criteria:
WHERE
T_00 . "MANDT" = :A3
AND ( T_01 ."ERDAT" <= :A4 OR T_01 . "AEDAT" <= :A5 )
AND ( ( T_01 ."ERDAT" >= :A6 OR T_01 . "AEDAT" <= :A7 )
AND ( ( T_01 ."ERDAT" <= :A8 OR T_01 . "AEDAT" >= :A9 )
AND ( T_01 ."ERDAT" >= :A10 OR T_01 . "AEDAT" >= :A11 )
)
)
What fields did you use in your index?
PS: Oh, and can you give the fieldvalues as well pleas?
Esp. if there are duplicates, like :A7 equals :A5 in ABAP source
Volker
Edited by: Volker Borowski on Oct 11, 2010 8:22 PM
‎2010 Oct 07 7:46 PM
Without
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5168 | 1392K| 1019K (27)|
|* 1 | HASH JOIN | | 5168 | 1392K| 1019K (27)|
|* 2 | INDEX RANGE SCAN | T9FATLAS_CCO~0 | 146 | 1314 | 0 (0)|
|* 3 | HASH JOIN | | 5168 | 1347K| 1019K (27)|
|* 4 | TABLE ACCESS FULL | TVKO | 57 | 798 | 2 (0)|
| 5 | NESTED LOOPS | | 5622 | 1389K| 1019K (27)|
|* 6 | TABLE ACCESS FULL | LIPS | 5622 | 1279K| 1016K (27)|
| 7 | TABLE ACCESS BY INDEX ROWID| LIKP | 1 | 20 | 0 (0)|
|* 8 | INDEX UNIQUE SCAN | LIKP~0 | 1 | | 0 (0)|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T_02"."BUKRS"="T_03"."BUKRS")
2 - access("T_03"."MANDT"=:A2)
3 - access("T_00"."VKORG"="T_02"."VKORG")
4 - filter("T_02"."MANDT"=:A1)
6 - filter(("T_01"."ERDAT"<=:A4 OR "T_01"."AEDAT"<=:A5) AND
("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7) AND ("T_01"."ERDAT"<=:A8 OR
"T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11) AND
"T_01"."MANDT"=:A0)
8 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")
I've tried using the explain when the indexes are present via ST04, with hint "NO_INDEX" and with "FULL("LIPS") but the indexes get used anyway.
The ABAP Is generated...
My question is: am I right in thinking that according to the EXPLAIN, the program should be running faster with the indexes?.
matt
‎2010 Oct 08 11:00 AM
Hi Matt,
>
> I've tried using the explain when the indexes are present via ST04, with hint "NO_INDEX" and with "FULL("LIPS") but the indexes get used anyway.
>
> My question is: am I right in thinking that according to the EXPLAIN, the program should be running faster with the indexes?.
>
> matt
For the hints in ST04 you must use the alias, so for example FULL (T_01). In ABAP you would use &TABLE 2&.
Oracle tries to estimate the cost, but of course this is an estimation. For example, when you do AEDAT <= :A5 he will probably estimate that 5% of the records are returned (you can confirm that if you check how many rows are in LIPS, and how many rows he is estimating in that step), but if AEDAT is a recent date a much higher percentage of rows will be returned. This higher percentage of rows means that a FULL actually performs better (because you can read directly the table in an optimized way, instead of the index and then the table repeatedly for every row returned from the index).
Hope this helps in any way,
Rui Dantas
‎2010 Oct 08 5:21 PM
Hi Matt,
as Rui already explained almost correctly (the AEDAT <= :A5 is MAX(0.009, 1 / NUM_DISTINCT(column) % ... ) the
optimizer estimates. With this estimations the like, <, >, between and correlations and other things can be a problem.
With a complex query like this, i would use the gather_plan_statistics:
check this site or google for "gather_plan_statistics"
http://kerryosborne.oracle-guy.com/2010/02/gather_plan_statistics/
Use the sql command editor for running the statement with the hint and the
actual parameters and the statements for getting the plan statistics
For each step in the plan you get then
estimated rows (e-rows)
actual rows (a-rows)
actual time (a-time)
actual buffer gets
with this information you can see where the optimizers estimations are close or far away
from the realitiy. it should bring you directly to that step in the plan that causes the long
run time....
Hope this helps
Kind regards,
Hermann
‎2010 Oct 11 10:58 AM
>
> as Rui already explained almost correctly (the AEDAT <= :A5 is MAX(0.009, 1 / NUM_DISTINCT(column) % ... ) ...
> Hermann
Hi Hermann,
I didn't mention the 1 / NUM_DISTINCT part because it did not seem relevant for a field like AEDAT, but I was curious about your 0.009 figure. These magic numbers are configurable, or?
In my system I get 5%:
LIPS: 5,425,703 Rows
SELECT * FROM lips WHERE aedat <= :a1
SELECT STATEMENT ( Estimated Costs = 126,053 , Estimated #Rows = 271,285 )
1 TABLE ACCESS FULL LIPS
( Estim. Costs = 126,053 , Estim. #Rows = 271,285 )
Estim. CPU-Costs = 16,885,467,271 Estim. IO-Costs = 123,329Edited by: Rob Burbank on Oct 13, 2010 4:11 PM
‎2010 Oct 11 12:12 PM
Hi Rui,
you are right:
The estimation for rows returned for AEDAT <= :A5 is 5 % (can not be changed).
The estimation for the filter factor in cost estimations however is FF = max(0.009, 1 / NDV(column))
as described in note 750631. The 0.009 can not be changed as well (as far as i know).
But no matter which one we choose the estimations (cost or rows) for <=, >=, or between ... and (<= or >=)
is very often too optmistic and can lead to wrong decisions quite often.
BTW: that's why they called it OPTImizer, because it's OPTImistic and calculates with OPTImal circumstances.
If it would be PESSImisitic... they would have named it PESSImizer.
Things get worse if we have ABAP developers who try to specify all fields in an index where we often see
where condions like
...
AND field BETWEEN <minval> AND <maxval>
...
or
...
AND field >= <minval>
the optimizer is really off with the estimations (a small amount of rows, costs estimated but everything is selected).
Surpressing these conditions can lead to better estimations and more efficient plans. We have seen this quite often.
Coming back to Matt's problem:
looking at the filters.... each <= or >= uses a OPTImistic estimations for rows and even more OPTImistic estimation
for the cost (filter factors for index accesses). These estimations are combined e.g. for the rows:
0,5 * 0,5 * 0,5 ...
6 - filter(("T_01"."ERDAT"<=:A4 OR "T_01"."AEDAT"<=:A5)
AND ("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7)
AND ("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9)
AND ("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11)
in the end we have a very optimistic estimation for rows and cost.
If the actual content is not as selective as the optimizer estimates (which is often the case)
we might get a plan which would be efficient for very small result sets but at the same time
is inefficient for bigger result sets.... .
The execution plan with the indexes would be optmial if the actual amount of rows would
match the optimizers expectations... .
Kind regars,
Hermann
‎2010 Oct 11 12:44 PM
small example:
Table T100 has 918007 rows in my system.
The Query:
select * from t100 where msgnr >= :a0
MSGNR is a field with Message Numbers for Application Areas and Languages.
It's a 3 character field and we have many '001's and '002's and so on and some
'999's and even some real characters (in my system).
With the default installtion we have 2 indexes:
Primary Key:
SPRSL, ARGB, MSGNR
and
Inex 001
ARBGB, MSGNR
the distintct keys are:
SPRSL 9
ARBGB 5703
MSGNR 1190
An index skip scan is no alternative here so we get a full table scan.
The optimizer estimates 5 % of the rows being returned.
If i would now create an index on MSGNR the optimizer would pick it
up since we have a selective field (1190 distinct keys and 5 % rows
estimation is pretty selective, istn't it? (The cost estimation for that
index range scan would even be better!)
If run the query with a value that returns 5 % or less of the data
e.g. where msgnr >= '850' or somthing like that
the execution is technically efficient and quite fast....
BUT
what if i would run it with:
where msgnr >= '000' or where msgnr >= '200'
or
where msgnr >= '000' and msgnr <= '999' (between)
we would still use the index (same estimations), but read almost all index blocks
AND almost all table blocks and many of them several times (depends on the clustering factor).
The query would be much slower in this case although the execution
plan and cost estimation looks nice. One could assume that it should
run faster but in cases where the estimation of the optimizer does not
match the real world (optimizer assumes good selectivity for these ranges
but they are not as selective) it doesn't.
@ MATT
So, yes, in some cases, creating an index can make execution plans look better and slow
down the response time.
You have to check the selectivity of your >= and <= predicates and compare the estimations of
the optimizer with the reality, either manually or automated with gather_plan_statistics.
Kind regards,
Hermann
‎2010 Oct 11 2:36 PM
Hi Hermann,
> The estimation for rows returned for AEDAT <= :A5 is 5 % (can not be changed).
> The estimation for the filter factor in cost estimations however is FF = max(0.009, 1 / NDV(column))
> as described in note 750631. The 0.009 can not be changed as well (as far as i know).
> Hermann
Great, thanks. Note 750631 explains well the different between filtter factor (FF, used for the cost estimation) and row filter factor (RFF, used for the estimations of number of rows), and in which special occasions they differ (one being this <= case). I confess I mostly look at the number of rows, because it is a "real" number, and so easier to understand.
> BTW: that's why they called it OPTImizer, because it's OPTImistic and calculates with OPTImal circumstances.
> If it would be PESSImisitic... they would have named it PESSImizer.
I like this definition..
but... with this < or > cases I find that it often works a a PESSImizer. Where I work I get to approve what is installed in production, so those >= minvalue would not be installed in the first place. But... you often have things like "all records created in the last day", so you get >= almost_the_max_value, and in that case 5% ends up being a too pessimistic choice.
Matt,
Sorry if this has gone into a somewhat different discussion but from what I saw from you in the past I would guess you wouldn't mind.
The conditions in your where clause seem quite confusing to me without knowing which ABAP variables are behind those placeholders. Maybe you could share the ABAP code, or I suppose you better than noone will know if (or which of) the conditions are really selective and which are things like >= minvalue.
Regards,
Rui Dantas
‎2010 Oct 11 4:34 PM
Hi Rui,
i agree that the estimated rows are easier to understand. And sometimes are even
more important than estimated costs e.g. if we talk about joins (how often does
the inner table has to be accessed?)
regarding the "all records created in the last day" and >= almost_the_max_value
is that "almost_the_max_value" included in the statistics? Or ist the max value in the
statistics less than the queried ">= almost_the_max_value"? In such a case where
the max value in the stats is let's say 100 and we look for >=109 and the real max
value is 110 (but no up to date statistics) the predicate is out of bound or out of range
and the optimizer should use a different calculation. I don't have the details only
this quote from the net:
"The optimizer prorates the selectivity based on the distance between the
predicate value and the maximum value (assuming the value is higher than the max),
that is, the farther the value is from the maximum value, the lower the selectivity will be."
or let me ask differently: Did you got wrong decisions because the estimation was
too negative? I rarely have seen such cases so far. The wrong decisions i have seen
were usually based on too optmistic estimates.
Kind regards,
Hermann
‎2010 Oct 11 7:16 PM
Hi Matt,
would you mind to give the information:
1) how many blocks for table LIPS
2) how many rows for table LIPS
3) how many rows for LIPS and your selection criteria:
WHERE
T_00 . "MANDT" = :A3
AND ( T_01 ."ERDAT" <= :A4 OR T_01 . "AEDAT" <= :A5 )
AND ( ( T_01 ."ERDAT" >= :A6 OR T_01 . "AEDAT" <= :A7 )
AND ( ( T_01 ."ERDAT" <= :A8 OR T_01 . "AEDAT" >= :A9 )
AND ( T_01 ."ERDAT" >= :A10 OR T_01 . "AEDAT" >= :A11 )
)
)
What fields did you use in your index?
PS: Oh, and can you give the fieldvalues as well pleas?
Esp. if there are duplicates, like :A7 equals :A5 in ABAP source
Volker
Edited by: Volker Borowski on Oct 11, 2010 8:22 PM
‎2010 Oct 12 9:45 AM
The original ABAP sql (generated by ACTA) is:
WHERE ( ( ( LIPS9~ERDAT <= $PARAM2 ) OR ( LIPS9~AEDAT <= $PARAM2 ) )
AND ( ( ( LIPS9~ERDAT >= $PARAM1 ) OR ( LIPS9~AEDAT <= $PARAM2 ) )
AND ( ( ( LIPS9~ERDAT <= $PARAM2 ) OR ( LIPS9~AEDAT >= $PARAM1 ) )
AND ( ( LIPS9~ERDAT >= $PARAM1 ) OR ( LIPS9~AEDAT >= $PARAM1 ) ) ) ) ).Which simplifies to
WHERE ( LIPS9~ERDAT <= $PARAM2 OR LIPS9~AEDAT <= $PARAM2 )
AND ( LIPS9~ERDAT >= $PARAM1 OR LIPS9~AEDAT <= $PARAM2 )
AND ( LIPS9~ERDAT <= $PARAM2 OR LIPS9~AEDAT >= $PARAM1 )
AND ( LIPS9~ERDAT >= $PARAM1 OR LIPS9~AEDAT >= $PARAM1 ).Now, $PARAM1 <= $PARAM2, and I think it's safe to say that ERDAT <= AEDAT. So the selection is looking for any documents that changed or were created between $PARAM1 and $PARAM2. And that simplifies to:
LIPS9~ERDAT <= $PARAM2 AND ( LIPS9~AEDAT <= $PARAM2 OR LIPS9~ERDAT >= $PARAM1 ) AND LIPS9~AEDAT >= $PARAM1I've shoved that through the EXPLAIN, and get encouraging results. I've now passed the info onto the ACTA team, who are going to look at redoing their selection to generate this simpler WHERE clause.
‎2010 Oct 12 10:35 AM
Hi Matt,
Just out of curiosity, what explain do you have now?
If I understand correctly you have an index Z03 by ERDAT and a Z04 by AEDAT.
I would try:
( erdat between $PARAM1 and $PARAM2 ) or ( aedat between $PARAM1 and $PARAM2 )
That should make two index scans (one using Z03 and another one using Z04) and then concatenate the results. If the interval (difference between $PARAM1 and $PARAM2) is small enough I would say that is your best bet.
Regards,
Rui
‎2010 Oct 12 10:47 AM
I don't have the full details on this PC, but I can tell you it only uses Z03. ( The indexes were created for another process, but were interfering with this one ).
matt
‎2010 Oct 12 10:57 AM
>
> I don't have the full details on this PC, but I can tell you it only uses Z03. ( The indexes were created for another process, but were interfering with this one ).
>
> matt
Hi.
You said the final goal is "changed or were created between $PARAM1 and $PARAM2".
Something that was changed (AEDAT) between $PARAM1 and $PARAM2 might have been created anywhere in time, so an access only by ERDAT is never enough (you'll have to use a FULL for the rest, or maybe another index access). That's why I think, and since you have both indexes, that they both should be used, and then the results concatenated.
Rui
‎2010 Oct 12 5:57 PM
Hi Matt,
I have no system to check right now, and I am not familiar with this LIPS table, but what happens to "several" changes?
Might be need to go for changedocuments ?
I.E. what would this situation been like:
ERDAT 20090103 AEDAT 20090107
P1 20090104 P2 20090105
whith a change on 20090105 AND 20090107 ???
Assuming AEDAT holds the date of the "last" change
In this situation it evaluates to "false" allthough there was a change on 20090105.
Anyway, if the logical Expression goes ok, it might be worth to check the worth of one descending index field,
depending on which field covers the bigger/lower range.
So
MANDTERDATAEDAT look like the obvious choice (and are creatable in DDIC), but
MANDTERDATAEDAT DESC
or
MANDTAEDATERDAT DESC
might be real boosters depending on data distribution and average PARAM1 / PARAM2 values
(but are only creatable on DB level allthough I never understood why, because the correspondig
DDIC-table from SAP has a tag for using a descending field).
Best regards
Volker
‎2010 Oct 13 9:23 AM
Hi Volker,
> Anyway, if the logical Expression goes ok, it might be worth to check the worth of one descending index field,
> depending on which field covers the bigger/lower range.
>
> So
> MANDTERDATAEDAT look like the obvious choice (and are creatable in DDIC), but
> MANDTERDATAEDAT DESC
> or
> MANDTAEDATERDAT DESC
> might be real boosters depending on data distribution and average PARAM1 / PARAM2 values
sounds interesting. Could you give more details in which circumstances this could be beneficial for
performance (as long as we don't talk about an order by which could benefit from a corresponding order
in an index)? I've never seen descending indexes beeing used for queury tuning so far, so i would be
interested in which case and how it could help for performance.
> (but are only creatable on DB level allthough I never understood why, because the correspondig
> DDIC-table from SAP has a tag for using a descending field).
I guess that not all db plattforms supported by SAP support descending index columns (or didn't
support them in the past)
Kind regards,
Hermann
‎2010 Oct 13 7:20 PM
Hi Hermann,
I do not like to hijack Matt's thread for a detailed description, so just the assets:
The basic idea is to make a range scan on index leaf blocks shorter
because your hitting the "interesting" records earlier and can abort the range scan
after the "interesting" records instead of "overreading" "uninteresting" records in an ascending scan
first and useless before you reach the "interesting" records.
It is rarely benefitial in direct table access (because Oracle knows "range scan descending" on standard indexes)
but it can be quite ok if your ">" is only a small number of rows (i.e. the first of 300 possible ones)
and the optimizer does not consider the descending scan because more than one field has the ">" condition.
And it can be a real booster in a nested loop join, because you have this benefit for each record joined.
Hope this gives some kind of insight
Volker
‎2010 Oct 13 8:41 PM
I love you hijacking my thread - I find all this stuff really interesting. If my moderator colleagues agree, I'll reference this thread in the intros and memorable discussions sticky thread.
‎2010 Oct 13 9:17 PM
‎2010 Oct 14 8:15 AM
Hi Volker,
thanks. My missing piece was this one:
> It is rarely benefitial in direct table access (because Oracle knows "range scan descending" on standard indexes)
> but it can be quite ok if your ">" is only a small number of rows (i.e. the first of 300 possible ones)
> and the optimizer does not consider the descending scan because more than one field has the ">" condition.
my first thought was, since we have "range scan descending" and double linked lists in the leafs (we can go forth and back ) i was wondering why we would like to store a column in descending order for performance reasons. It sounded somehow superfluous. The missing piece was, if the optimizer does not use the descending scan and sticks to the ascending one, the effoet in scanning the leafs could indeed be shorter if the interesting records come first. So thanks for that hint.
I have seen people using "sorting tables" (reorganize or IOT) to get a better clustering of interesting rows but never descending indexes so far... but it fits in this scenario as well.
@Matt, in case you try it, could you let us know your results please?
Kind regards,
Hermann
‎2010 Oct 14 3:19 PM
Hi all,
I have to admit, that our findings hapened on oracle 9.2.0.early (with "early" being 2,3 or 4),
and due to success are still in place on the upgraded version 10 for quite long.
I tried today to compile a small testcase on Oracle 10 and was not successfull.
I mean, the descending index still works and performs ok, but I did not manage to set up
a small case, where the descending index outruns the standard index (as we measured on 9.2).
I'll try again on the weekend, but it might be that this thing is not valid any more for version 10.
(Which must not be bad, because I could get rid of a couple of non-DDIC indexes
Volker
‎2010 Oct 14 4:57 PM
OK,
I have something to play with from my toy-around-oracle-11-db.
(it is just a minimal CREATE DATABASE with catalog and catproc in place and nothing else).
It shows, that if the optimizer has the free choice, it favors the descending index for a "greater than" condition.
So the optimizer must see some benefit to do it that way.
Here you go
create tablespace data datafile 'C:\ORACLE\TST\SAPDATA1\DATA.DATA1' size 100 m;
create table zztest tablespace data as select * from dba_extents;
-- This might not be compareable to SAP, because all columns may be NULL
insert into zztest select * from dba_extents;
insert into zztest select * from dba_extents;
commit;
create index zztest_asc on zztest (SEGMENT_NAME, EXTENT_ID, FILE_ID, BLOCK_ID) tablespace data;
create index zztest_desc on zztest (SEGMENT_NAME, EXTENT_ID desc ,FILE_ID desc ,BLOCK_ID desc) tablespace data;
analyze table zztest compute statistics;
analyze index zztest_asc compute statistics;
analyze index zztest_desc compute statistics;
set linesize 130 pagesite 0 autotrace on heading off
--ZZTEST has Extent_IDs 0-16 in my case
-- WITHOUT HINTS and BOTH indexes in place,
-- the optimizer favors the DESC index with the "greater than" Clause
-- the optimizer favors the ASC index with the "less than" Clause
In all cases I have 3 consistent gets for the below selects, so nothing that really outruns the other.
The real number of rows is 12 in all cases and there is no access to table blocks.
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id >12;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 96 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ZZTEST_DESC | 4 | 96 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id < 4; -- value beeing fair to "> 12"
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ZZTEST_ASC | 1 | 24 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
But it also shows, that you can use the opposite sorted index for access as well
(this might be, what is diffrent now against 9.2)
drop index zztest_asc;
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' ad extent_id < 4;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ZZTEST_DESC | 1 | 24 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
create index zztest_asc on zztest (SEGMENT_NAME, EXTENT_ID, FILE_ID, BLOCK_ID) tablespace data;
analyze index zztest_asc compute statistics;
drop index zztest_desc;
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id >12;
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 96 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| ZZTEST_ASC | 4 | 96 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Funny: if you
analyze table zztest compute statistics for all columns;
the descending index is not considered any more.
Might need to do additional stuff with dbms_stats ...
As said, I'll play on with this on the weekend
Volker
‎2010 Oct 18 10:04 AM
Hi again,
ok, it looks like I need to revise my thoughts about these decending indexes.
I do now think it has nothing to do with oracle doing diffrent kinds of ACCESS to these indexes, as I thought before.
I now think as Hermanns idea was before, the the success I had with my indexes was
accidently by getting a diffrent/better clustering due to the diffrent sorts of the columns.
Using the testcase above and this query:
select /*+ use_nl (a b) */
a.segment_name,a.extent_id, a.file_id, a.block_id
from zztest a, zztest b
where a.segment_name='TESTCASE'
AND b.segment_name=a.segment_name
and a.bytes=8192
and a.extent_id > 50
and a.file_id > 50
and a.block_id > 50
and b.extent_id > 50
and b.file_id > 50
and b.block_id > 50
;
with diffrent sets of datadistribution of 60 records like
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,27,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,28,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,29,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,30,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,14,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,13,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,9999,8192 );
I toggled between both indexes in place to watch the amount of GETs.
In 90% of the datadistribution I have chosen, the result was 7 GETs for both index-access way.
I could compile only one set, where the asc index had 7 and the desc had 9.
Now that was not the result I liked to show (it would just have been better the other way round for my thesis ,
but it showed, that the datadistribution could lead to data being differently arranged in the index
so that there is a difference.
I can give you the set of original columns from the table involved. They were like:
/BIC/AO_VER_VF /BIC/AO_SYS_VF /BIC/AO_VAL_VF /BIC/AO_VER_VT /BIC/AO_SYS_VT /BIC/AO_VAL_VT
150 ... 400 rows of these type
30.07.2010 201008131339597017430 01.01.1000 31.07.2010 201008301248551283450 30.07.2010
30.07.2010 201008131339597017430 30.07.2010 31.07.2010 201008301248551283450 31.12.9999
30.07.2010 201008301248551283450 01.01.1000 31.07.2010 999912310000000000000 30.07.2010
30.07.2010 201008301248551283450 30.07.2010 31.07.2010 999912310000000000000 31.12.9999
31.07.2010 201008020916377367780 01.01.1000 31.12.9999 999912310000000000000 31.07.2010
31.07.2010 201008020916377367780 31.07.2010 31.12.9999 999912310000000000000 31.12.9999 <<< This is the guy we like to have
typical join (BW-ODS)
WHERE
:
: some key equi stuff here
:
AND T_00 . "/BIC/AO_VER_VF" <= '20100930'
AND T_00 . "/BIC/AO_SYS_VF" <= '201010130811010000000'
AND T_00 . "/BIC/AO_VAL_VF" <= '20100930'
AND T_00 . "/BIC/AO_VER_VT" > '20100930'
AND T_00 . "/BIC/AO_SYS_VT" > '201010130811010000000'
AND T_00 . "/BIC/AO_VAL_VT" > '20100930'
:
: Some T_00 Filters here
:
AND T_02 . "/BIC/AO_VER_VF" <= '20100930'
AND T_02 . "/BIC/AO_SYS_VF" <= '201010130811010000000'
AND T_02 . "/BIC/AO_VAL_VF" <= '20100930'
AND T_02 . "/BIC/AO_VER_VT" > '20100930'
AND T_02 . "/BIC/AO_SYS_VT" > '201010130811010000000'
AND T_02 . "/BIC/AO_VAL_VT" > '20100930'
AND T_02 . "/BIC/AO_PACKNO" BETWEEN '108981' AND '109021'
:
: Some T_02 Filters here
:
AND T_01 . "/BIC/AO_VER_VF" <= '20100930'
AND T_01 . "/BIC/AO_SYS_VF" <= '201010130811010000000'
AND T_01 . "/BIC/AO_VER_VT" > '20100930'
AND T_01 . "/BIC/AO_SYS_VT" > '201010130811010000000'
:
: Some T_01 Filters here
: See, T_01 has no VAL_xx fields
:
AND T_03 . "/BIC/AO_VER_VF" <= '20100930'
AND T_03 . "/BIC/AO_SYS_VF" <= '201010130811010000000'
AND T_03 . "/BIC/AO_VAL_VF" <= '20100930'
AND T_03 . "/BIC/AO_VER_VT" > '20100930'
AND T_03 . "/BIC/AO_SYS_VT" > '201010130811010000000'
AND T_03 . "/BIC/AO_VAL_VT" > '20100930'
:
: Some T_03 Filters here
So 6 CHAR DATE fields with two (_SYS_) of them being extremly lengthy, which could cause quite a couple of difference in clustering.
So unfortunately no proof
Volker
‎2010 Oct 18 1:24 PM
Hi Volker,
>
> So unfortunately no proof
> Volker
thank you very much for your effort anyway.
The conversation here made at least some of us think about the details of
how data is accessed which is imho a good thing.
I can imagine that you got godd effects either by accident (clustering / sorting) or
in an earlier release. Things are changing so quickly in the database world (Database
and default configuration from SAP). Once you make a statement about a detail it will
soon not be true anymore or not precise
enough or ... . I try to stay up to date with the details that are important for me on the
database plattforms that SAP supports and it happens regulary that i find out that
the "things i know" ... have changed over time.
Kind regards,
Hermann