Solved: Indexes making program run slower...

matt · ‎2010 Oct 07

With the indexes, the program runs more slowly, much more slowly than when the indexes aren't there. However, the explain function to me indicates that with the indexes, it should be much better.

The explain when the indexes are present is:


SQL Statement
------------------------------------------------------------------------
SELECT
  T_01 . "MANDT" , T_01 . "VBELN" , T_01 . "VFDAT" , T_02 . "BUKRS" ,
  T_00 . "VKORG" ,   T_02 . "VKORG" , T_00 . "VBELN" , T_03 . "BUKRS"
FROM
 "LIPS" T_01 INNER JOIN "LIKP" T_00 ON T_01 . "MANDT" = :A0 AND T_00 . "VBELN" = T_01 . "VBELN"
  INNER JOIN "TVKO" T_02 ON T_02 . "MANDT" = :A1 AND T_00 . "VKORG" = T_02 . "VKORG" INNER JOIN "T9F
  ATLAS_CCO" T_03   ON T_03 . "MANDT" = :A2 AND T_02 . "BUKRS" = T_03 . "BUKRS"
WHERE
  T_00 . "MANDT" = :A3 AND ( T_01 . "ERDAT" <= :A4 OR T_01 . "AEDAT" <= :A5 )   AND ( ( T_01 .
  "ERDAT" >= :A6 OR T_01 . "AEDAT" <= :A7 )                        AND ( ( T_01 .   "ERDAT" <= :A8
  OR T_01 . "AEDAT" >= :A9 )                        AND ( T_01 . "ERDAT" >= :A10 OR   T_01 .
  "AEDAT" >= :A11 ) ) )


Execution Plan

--------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |  4632 |  1248K| 73711   (3)|
|   1 |  CONCATENATION                  |                |       |       |            |
|   2 |   NESTED LOOPS                  |                |  2375 |   640K| 41855   (3)|
|*  3 |    HASH JOIN                    |                |  2375 |   619K| 41854   (3)|
|*  4 |     TABLE ACCESS FULL           | TVKO           |    59 |   826 |     2   (0)|
|   5 |     NESTED LOOPS                |                |  2558 |   632K| 41851   (3)|
|*  6 |      TABLE ACCESS BY INDEX ROWID| LIPS           |  2558 |   582K| 40814   (3)|
|*  7 |       INDEX RANGE SCAN          | LIPS~Z04       |   496K|       |   492   (7)|
|   8 |      TABLE ACCESS BY INDEX ROWID| LIKP           |     1 |    20 |     0   (0)|
|*  9 |       INDEX UNIQUE SCAN         | LIKP~0         |     1 |       |     0   (0)|
|* 10 |    INDEX UNIQUE SCAN            | T9FATLAS_CCO~0 |     1 |     9 |     0   (0)|
|  11 |   NESTED LOOPS                  |                |  2257 |   608K| 31856   (4)|
|* 12 |    HASH JOIN                    |                |  2257 |   588K| 31856   (4)|
|* 13 |     TABLE ACCESS FULL           | TVKO           |    59 |   826 |     2   (0)|
|  14 |     NESTED LOOPS                |                |  2430 |   600K| 31853   (4)|
|* 15 |      TABLE ACCESS BY INDEX ROWID| LIPS           |  2430 |   552K| 30868   (4)|
|* 16 |       INDEX RANGE SCAN          | LIPS~Z03       |   496K|       |   495   (7)|
|  17 |      TABLE ACCESS BY INDEX ROWID| LIKP           |     1 |    20 |     0   (0)|
|* 18 |       INDEX UNIQUE SCAN         | LIKP~0         |     1 |       |     0   (0)|
|* 19 |    INDEX UNIQUE SCAN            | T9FATLAS_CCO~0 |     1 |     9 |     0   (0)|
-----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T_00"."VKORG"="T_02"."VKORG")
   4 - filter("T_02"."MANDT"=:A1)
   6 - filter(("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7) AND
              ("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A10 OR
              "T_01"."AEDAT">=:A11))
   7 - access("T_01"."MANDT"=:A0 AND "T_01"."AEDAT"<=:A5)
   9 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")
  10 - access("T_03"."MANDT"=:A2 AND "T_02"."BUKRS"="T_03"."BUKRS")
  12 - access("T_00"."VKORG"="T_02"."VKORG")
  13 - filter("T_02"."MANDT"=:A1)
  15 - filter(("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11) AND
              ("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A6 OR
              "T_01"."AEDAT"<=:A7) AND LNNVL("T_01"."AEDAT"<=:A5))
  16 - access("T_01"."MANDT"=:A0 AND "T_01"."ERDAT"<=:A4)
  18 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")
  19 - access("T_03"."MANDT"=:A2 AND "T_02"."BUKRS"="T_03"."BUKRS")

volker_borowski2 · ‎2010 Oct 11

Hi Matt,

would you mind to give the information:

1) how many blocks for table LIPS

2) how many rows for table LIPS

3) how many rows for LIPS and your selection criteria:


WHERE
  T_00 . "MANDT" = :A3 
AND (   T_01 ."ERDAT" <= :A4  OR T_01 . "AEDAT" <= :A5 )   
AND (     ( T_01 ."ERDAT" >= :A6  OR T_01 . "AEDAT" <= :A7 ) 
      AND (     ( T_01 ."ERDAT" <= :A8  OR T_01 . "AEDAT" >= :A9 ) 
            AND ( T_01 ."ERDAT" >= :A10 OR   T_01 .  "AEDAT" >= :A11 ) 
          )
    )

What fields did you use in your index?

PS: Oh, and can you give the fieldvalues as well pleas?

Esp. if there are duplicates, like :A7 equals :A5 in ABAP source

Volker

Edited by: Volker Borowski on Oct 11, 2010 8:22 PM

matt · ‎2010 Oct 07

Without



--------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |  5168 |  1392K|  1019K (27)|
|*  1 |  HASH JOIN                     |                |  5168 |  1392K|  1019K (27)|
|*  2 |   INDEX RANGE SCAN             | T9FATLAS_CCO~0 |   146 |  1314 |     0   (0)|
|*  3 |   HASH JOIN                    |                |  5168 |  1347K|  1019K (27)|
|*  4 |    TABLE ACCESS FULL           | TVKO           |    57 |   798 |     2   (0)|
|   5 |    NESTED LOOPS                |                |  5622 |  1389K|  1019K (27)|
|*  6 |     TABLE ACCESS FULL          | LIPS           |  5622 |  1279K|  1016K (27)|
|   7 |     TABLE ACCESS BY INDEX ROWID| LIKP           |     1 |    20 |     0   (0)|
|*  8 |      INDEX UNIQUE SCAN         | LIKP~0         |     1 |       |     0   (0)|
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T_02"."BUKRS"="T_03"."BUKRS")
   2 - access("T_03"."MANDT"=:A2)
   3 - access("T_00"."VKORG"="T_02"."VKORG")
   4 - filter("T_02"."MANDT"=:A1)
   6 - filter(("T_01"."ERDAT"<=:A4 OR "T_01"."AEDAT"<=:A5) AND
              ("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7) AND ("T_01"."ERDAT"<=:A8 OR
              "T_01"."AEDAT">=:A9) AND ("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11) AND
              "T_01"."MANDT"=:A0)
   8 - access("T_00"."MANDT"=:A3 AND "T_00"."VBELN"="T_01"."VBELN")

I've tried using the explain when the indexes are present via ST04, with hint "NO_INDEX" and with "FULL("LIPS") but the indexes get used anyway.

The ABAP Is generated...

My question is: am I right in thinking that according to the EXPLAIN, the program should be running faster with the indexes?.

matt

Rui_Dantas · ‎2010 Oct 08

Hi Matt,

>


> I've tried using the explain when the indexes are present via ST04, with hint "NO_INDEX" and with "FULL("LIPS") but the indexes get used anyway.
> 
> My question is: am I right in thinking that according to the EXPLAIN, the program should be running faster with the indexes?.
> 
> matt

For the hints in ST04 you must use the alias, so for example FULL (T_01). In ABAP you would use &TABLE 2&.

Oracle tries to estimate the cost, but of course this is an estimation. For example, when you do AEDAT <= :A5 he will probably estimate that 5% of the records are returned (you can confirm that if you check how many rows are in LIPS, and how many rows he is estimating in that step), but if AEDAT is a recent date a much higher percentage of rows will be returned. This higher percentage of rows means that a FULL actually performs better (because you can read directly the table in an optimized way, instead of the index and then the table repeatedly for every row returned from the index).

Hope this helps in any way,

Rui Dantas

HermannGahm · ‎2010 Oct 08

Hi Matt,

as Rui already explained almost correctly (the AEDAT <= :A5 is MAX(0.009, 1 / NUM_DISTINCT(column) % ... ) the

optimizer estimates. With this estimations the like, <, >, between and correlations and other things can be a problem.

With a complex query like this, i would use the gather_plan_statistics:

check this site or google for "gather_plan_statistics"

http://kerryosborne.oracle-guy.com/2010/02/gather_plan_statistics/

Use the sql command editor for running the statement with the hint and the

actual parameters and the statements for getting the plan statistics

For each step in the plan you get then

estimated rows (e-rows)

actual rows (a-rows)

actual time (a-time)

actual buffer gets

with this information you can see where the optimizers estimations are close or far away

from the realitiy. it should bring you directly to that step in the plan that causes the long

run time....

Hope this helps

Kind regards,

Hermann

Rui_Dantas · ‎2010 Oct 11

>


> as Rui already explained almost correctly (the AEDAT <= :A5  is MAX(0.009, 1 / NUM_DISTINCT(column) % ...  ) ...
> Hermann

Hi Hermann,

I didn't mention the 1 / NUM_DISTINCT part because it did not seem relevant for a field like AEDAT, but I was curious about your 0.009 figure. These magic numbers are configurable, or?

In my system I get 5%:

LIPS: 5,425,703 Rows

SELECT * FROM lips WHERE aedat <= :a1      

SELECT STATEMENT ( Estimated Costs = 126,053 , Estimated #Rows = 271,285 ) 
       1 TABLE ACCESS FULL LIPS 
         ( Estim. Costs = 126,053 , Estim. #Rows = 271,285 ) 
         Estim. CPU-Costs = 16,885,467,271 Estim. IO-Costs = 123,329

Edited by: Rob Burbank on Oct 13, 2010 4:11 PM

HermannGahm · ‎2010 Oct 11

Hi Rui,

you are right:

The estimation for rows returned for AEDAT <= :A5 is 5 % (can not be changed).

The estimation for the filter factor in cost estimations however is FF = max(0.009, 1 / NDV(column))

as described in note 750631. The 0.009 can not be changed as well (as far as i know).

But no matter which one we choose the estimations (cost or rows) for <=, >=, or between ... and (<= or >=)

is very often too optmistic and can lead to wrong decisions quite often.

BTW: that's why they called it OPTImizer, because it's OPTImistic and calculates with OPTImal circumstances.

If it would be PESSImisitic... they would have named it PESSImizer.

Things get worse if we have ABAP developers who try to specify all fields in an index where we often see

where condions like

...

AND field BETWEEN <minval> AND <maxval>

...

or

...

AND field >= <minval>

the optimizer is really off with the estimations (a small amount of rows, costs estimated but everything is selected).

Surpressing these conditions can lead to better estimations and more efficient plans. We have seen this quite often.

Coming back to Matt's problem:

looking at the filters.... each <= or >= uses a OPTImistic estimations for rows and even more OPTImistic estimation

for the cost (filter factors for index accesses). These estimations are combined e.g. for the rows:

0,5 * 0,5 * 0,5 ...

6 - filter(("T_01"."ERDAT"<=:A4 OR "T_01"."AEDAT"<=:A5)

AND ("T_01"."ERDAT">=:A6 OR "T_01"."AEDAT"<=:A7)

AND ("T_01"."ERDAT"<=:A8 OR "T_01"."AEDAT">=:A9)

AND ("T_01"."ERDAT">=:A10 OR "T_01"."AEDAT">=:A11)

in the end we have a very optimistic estimation for rows and cost.

If the actual content is not as selective as the optimizer estimates (which is often the case)

we might get a plan which would be efficient for very small result sets but at the same time

is inefficient for bigger result sets.... .

The execution plan with the indexes would be optmial if the actual amount of rows would

match the optimizers expectations... .

Kind regars,

Hermann

HermannGahm · ‎2010 Oct 11

small example:

Table T100 has 918007 rows in my system.

The Query:

select * from t100 where msgnr >= :a0

MSGNR is a field with Message Numbers for Application Areas and Languages.

It's a 3 character field and we have many '001's and '002's and so on and some

'999's and even some real characters (in my system).

With the default installtion we have 2 indexes:

Primary Key:

SPRSL, ARGB, MSGNR

and

Inex 001

ARBGB, MSGNR

the distintct keys are:

SPRSL 9

ARBGB 5703

MSGNR 1190

An index skip scan is no alternative here so we get a full table scan.

The optimizer estimates 5 % of the rows being returned.

If i would now create an index on MSGNR the optimizer would pick it

up since we have a selective field (1190 distinct keys and 5 % rows

estimation is pretty selective, istn't it? (The cost estimation for that

index range scan would even be better!)

If run the query with a value that returns 5 % or less of the data

e.g. where msgnr >= '850' or somthing like that

the execution is technically efficient and quite fast....

BUT

what if i would run it with:

where msgnr >= '000' or where msgnr >= '200'

or

where msgnr >= '000' and msgnr <= '999' (between)

we would still use the index (same estimations), but read almost all index blocks

AND almost all table blocks and many of them several times (depends on the clustering factor).

The query would be much slower in this case although the execution

plan and cost estimation looks nice. One could assume that it should

run faster but in cases where the estimation of the optimizer does not

match the real world (optimizer assumes good selectivity for these ranges

but they are not as selective) it doesn't.

@ MATT

So, yes, in some cases, creating an index can make execution plans look better and slow

down the response time.

You have to check the selectivity of your >= and <= predicates and compare the estimations of

the optimizer with the reality, either manually or automated with gather_plan_statistics.

Kind regards,

Hermann

Rui_Dantas · ‎2010 Oct 11

Hi Hermann,

> The estimation for rows returned  for AEDAT <= :A5  is 5 % (can not be changed).
> The estimation for the filter factor in cost estimations however is FF = max(0.009, 1 / NDV(column))
> as described in note 750631. The 0.009 can not be changed as well (as far as i know).
> Hermann

Great, thanks. Note 750631 explains well the different between filtter factor (FF, used for the cost estimation) and row filter factor (RFF, used for the estimations of number of rows), and in which special occasions they differ (one being this <= case). I confess I mostly look at the number of rows, because it is a "real" number, and so easier to understand.

 > BTW: that's why they called it OPTImizer, because it's OPTImistic and calculates with OPTImal circumstances.
> If it would be PESSImisitic... they would have named it PESSImizer.

I like this definition..

but... with this < or > cases I find that it often works a a PESSImizer. Where I work I get to approve what is installed in production, so those >= minvalue would not be installed in the first place. But... you often have things like "all records created in the last day", so you get >= almost_the_max_value, and in that case 5% ends up being a too pessimistic choice.

Matt,

Sorry if this has gone into a somewhat different discussion but from what I saw from you in the past I would guess you wouldn't mind.

The conditions in your where clause seem quite confusing to me without knowing which ABAP variables are behind those placeholders. Maybe you could share the ABAP code, or I suppose you better than noone will know if (or which of) the conditions are really selective and which are things like >= minvalue.

Regards,

Rui Dantas

HermannGahm · ‎2010 Oct 11

Hi Rui,

i agree that the estimated rows are easier to understand. And sometimes are even

more important than estimated costs e.g. if we talk about joins (how often does

the inner table has to be accessed?)

regarding the "all records created in the last day" and >= almost_the_max_value

is that "almost_the_max_value" included in the statistics? Or ist the max value in the

statistics less than the queried ">= almost_the_max_value"? In such a case where

the max value in the stats is let's say 100 and we look for >=109 and the real max

value is 110 (but no up to date statistics) the predicate is out of bound or out of range

and the optimizer should use a different calculation. I don't have the details only

this quote from the net:

"The optimizer prorates the selectivity based on the distance between the

predicate value and the maximum value (assuming the value is higher than the max),

that is, the farther the value is from the maximum value, the lower the selectivity will be."

or let me ask differently: Did you got wrong decisions because the estimation was

too negative? I rarely have seen such cases so far. The wrong decisions i have seen

were usually based on too optmistic estimates.

Kind regards,

Hermann

volker_borowski2 · ‎2010 Oct 11

Hi Matt,

would you mind to give the information:

1) how many blocks for table LIPS

2) how many rows for table LIPS

3) how many rows for LIPS and your selection criteria:


WHERE
  T_00 . "MANDT" = :A3 
AND (   T_01 ."ERDAT" <= :A4  OR T_01 . "AEDAT" <= :A5 )   
AND (     ( T_01 ."ERDAT" >= :A6  OR T_01 . "AEDAT" <= :A7 ) 
      AND (     ( T_01 ."ERDAT" <= :A8  OR T_01 . "AEDAT" >= :A9 ) 
            AND ( T_01 ."ERDAT" >= :A10 OR   T_01 .  "AEDAT" >= :A11 ) 
          )
    )

What fields did you use in your index?

PS: Oh, and can you give the fieldvalues as well pleas?

Esp. if there are duplicates, like :A7 equals :A5 in ABAP source

Volker

Edited by: Volker Borowski on Oct 11, 2010 8:22 PM

matt · ‎2010 Oct 12

The original ABAP sql (generated by ACTA) is:

WHERE ( ( ( LIPS9~ERDAT <= $PARAM2 ) OR ( LIPS9~AEDAT <= $PARAM2 ) )
 AND ( ( ( LIPS9~ERDAT >= $PARAM1 ) OR ( LIPS9~AEDAT <= $PARAM2 ) )
 AND ( ( ( LIPS9~ERDAT <= $PARAM2 )  OR ( LIPS9~AEDAT >= $PARAM1 ) )
 AND ( ( LIPS9~ERDAT >= $PARAM1 ) OR ( LIPS9~AEDAT >= $PARAM1 ) ) ) ) ).

Which simplifies to

WHERE ( LIPS9~ERDAT <= $PARAM2 OR LIPS9~AEDAT <= $PARAM2 ) 
  AND ( LIPS9~ERDAT >= $PARAM1 OR LIPS9~AEDAT <= $PARAM2 )
  AND ( LIPS9~ERDAT <= $PARAM2 OR LIPS9~AEDAT >= $PARAM1 )
  AND ( LIPS9~ERDAT >= $PARAM1 OR LIPS9~AEDAT >= $PARAM1 ).

Now, $PARAM1 <= $PARAM2, and I think it's safe to say that ERDAT <= AEDAT. So the selection is looking for any documents that changed or were created between $PARAM1 and $PARAM2. And that simplifies to:

LIPS9~ERDAT <= $PARAM2 AND ( LIPS9~AEDAT <= $PARAM2 OR LIPS9~ERDAT >= $PARAM1 ) AND LIPS9~AEDAT >= $PARAM1

I've shoved that through the EXPLAIN, and get encouraging results. I've now passed the info onto the ACTA team, who are going to look at redoing their selection to generate this simpler WHERE clause.

Rui_Dantas · ‎2010 Oct 12

Hi Matt,

Just out of curiosity, what explain do you have now?

If I understand correctly you have an index Z03 by ERDAT and a Z04 by AEDAT.

I would try:

( erdat between $PARAM1 and $PARAM2 ) or ( aedat between $PARAM1 and $PARAM2 )

That should make two index scans (one using Z03 and another one using Z04) and then concatenate the results. If the interval (difference between $PARAM1 and $PARAM2) is small enough I would say that is your best bet.

Regards,

Rui

matt · ‎2010 Oct 12

I don't have the full details on this PC, but I can tell you it only uses Z03. ( The indexes were created for another process, but were interfering with this one ).

matt

Rui_Dantas · ‎2010 Oct 12

>


> I don't have the full details on this PC, but I can tell you it only uses Z03. ( The indexes were created for another process, but were interfering with this one ).
> 
> matt

Hi.

You said the final goal is "changed or were created between $PARAM1 and $PARAM2".

Something that was changed (AEDAT) between $PARAM1 and $PARAM2 might have been created anywhere in time, so an access only by ERDAT is never enough (you'll have to use a FULL for the rest, or maybe another index access). That's why I think, and since you have both indexes, that they both should be used, and then the results concatenated.

Rui

volker_borowski2 · ‎2010 Oct 12

Hi Matt,

I have no system to check right now, and I am not familiar with this LIPS table, but what happens to "several" changes?

Might be need to go for changedocuments ?

I.E. what would this situation been like:


ERDAT   20090103  AEDAT   20090107
P1      20090104  P2      20090105

whith a change on 20090105 AND 20090107 ???
Assuming AEDAT holds the date of the "last" change

In this situation it evaluates to "false" allthough there was a change on 20090105.

Anyway, if the logical Expression goes ok, it might be worth to check the worth of one descending index field,

depending on which field covers the bigger/lower range.

So

MANDTERDATAEDAT look like the obvious choice (and are creatable in DDIC), but

MANDTERDATAEDAT DESC

or

MANDTAEDATERDAT DESC

might be real boosters depending on data distribution and average PARAM1 / PARAM2 values

(but are only creatable on DB level allthough I never understood why, because the correspondig

DDIC-table from SAP has a tag for using a descending field).

Best regards

Volker

HermannGahm · ‎2010 Oct 13

Hi Volker,

> Anyway, if the logical Expression goes ok, it might be worth to check the worth of one descending index field,

> depending on which field covers the bigger/lower range.

>

> So

> MANDTERDATAEDAT look like the obvious choice (and are creatable in DDIC), but

> MANDTERDATAEDAT DESC

> or

> MANDTAEDATERDAT DESC

> might be real boosters depending on data distribution and average PARAM1 / PARAM2 values

sounds interesting. Could you give more details in which circumstances this could be beneficial for

performance (as long as we don't talk about an order by which could benefit from a corresponding order

in an index)? I've never seen descending indexes beeing used for queury tuning so far, so i would be

interested in which case and how it could help for performance.

> (but are only creatable on DB level allthough I never understood why, because the correspondig

> DDIC-table from SAP has a tag for using a descending field).

I guess that not all db plattforms supported by SAP support descending index columns (or didn't

support them in the past)

Kind regards,

Hermann

volker_borowski2 · ‎2010 Oct 13

Hi Hermann,

I do not like to hijack Matt's thread for a detailed description, so just the assets:

The basic idea is to make a range scan on index leaf blocks shorter

because your hitting the "interesting" records earlier and can abort the range scan

after the "interesting" records instead of "overreading" "uninteresting" records in an ascending scan

first and useless before you reach the "interesting" records.

It is rarely benefitial in direct table access (because Oracle knows "range scan descending" on standard indexes)

but it can be quite ok if your ">" is only a small number of rows (i.e. the first of 300 possible ones)

and the optimizer does not consider the descending scan because more than one field has the ">" condition.

And it can be a real booster in a nested loop join, because you have this benefit for each record joined.

Hope this gives some kind of insight

Volker

matt · ‎2010 Oct 13

I love you hijacking my thread - I find all this stuff really interesting. If my moderator colleagues agree, I'll reference this thread in the intros and memorable discussions sticky thread.

Former Member · ‎2010 Oct 13

Good idea!

Rob

HermannGahm · ‎2010 Oct 14

Hi Volker,

thanks. My missing piece was this one:

> It is rarely benefitial in direct table access (because Oracle knows "range scan descending" on standard indexes)

> but it can be quite ok if your ">" is only a small number of rows (i.e. the first of 300 possible ones)

> and the optimizer does not consider the descending scan because more than one field has the ">" condition.

my first thought was, since we have "range scan descending" and double linked lists in the leafs (we can go forth and back ) i was wondering why we would like to store a column in descending order for performance reasons. It sounded somehow superfluous. The missing piece was, if the optimizer does not use the descending scan and sticks to the ascending one, the effoet in scanning the leafs could indeed be shorter if the interesting records come first. So thanks for that hint.

I have seen people using "sorting tables" (reorganize or IOT) to get a better clustering of interesting rows but never descending indexes so far... but it fits in this scenario as well.

@Matt, in case you try it, could you let us know your results please?

Kind regards,

Hermann

volker_borowski2 · ‎2010 Oct 14

Hi all,

I have to admit, that our findings hapened on oracle 9.2.0.early (with "early" being 2,3 or 4),

and due to success are still in place on the upgraded version 10 for quite long.

I tried today to compile a small testcase on Oracle 10 and was not successfull.

I mean, the descending index still works and performs ok, but I did not manage to set up

a small case, where the descending index outruns the standard index (as we measured on 9.2).

I'll try again on the weekend, but it might be that this thing is not valid any more for version 10.

(Which must not be bad, because I could get rid of a couple of non-DDIC indexes

Volker

volker_borowski2 · ‎2010 Oct 14

OK,

I have something to play with from my toy-around-oracle-11-db.

(it is just a minimal CREATE DATABASE with catalog and catproc in place and nothing else).

It shows, that if the optimizer has the free choice, it favors the descending index for a "greater than" condition.

So the optimizer must see some benefit to do it that way.

Here you go


create tablespace data datafile 'C:\ORACLE\TST\SAPDATA1\DATA.DATA1' size 100 m;

create table zztest tablespace data as select * from dba_extents;
-- This might not be compareable to SAP, because all columns may be NULL
insert into  zztest select * from dba_extents;
insert into  zztest select * from dba_extents;
commit;
create index zztest_asc  on zztest (SEGMENT_NAME, EXTENT_ID,      FILE_ID,      BLOCK_ID)      tablespace data;
create index zztest_desc on zztest (SEGMENT_NAME, EXTENT_ID desc ,FILE_ID desc ,BLOCK_ID desc) tablespace data;
analyze table zztest compute statistics;
analyze index zztest_asc  compute statistics;
analyze index zztest_desc compute statistics;

set linesize 130 pagesite 0 autotrace on heading off

--ZZTEST has Extent_IDs 0-16 in my case

-- WITHOUT HINTS and BOTH indexes in place, 
-- the optimizer favors the DESC index with the "greater than" Clause
-- the optimizer favors the ASC  index with the "less than"  Clause

In all cases I have 3 consistent gets for the below selects, so nothing that really outruns the other.

The real number of rows is 12 in all cases and there is no access to table blocks.


select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id >12;

--------------------------------------------------------------------------------
| Id  | Operation        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |             |     4 |    96 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ZZTEST_DESC |     4 |    96 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id < 4;  -- value beeing fair to "> 12"

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     1 |    24 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ZZTEST_ASC |     1 |    24 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

But it also shows, that you can use the opposite sorted index for access as well

(this might be, what is diffrent now against 9.2)


drop index zztest_asc;
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' ad extent_id < 4;

--------------------------------------------------------------------------------
| Id  | Operation        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |             |     1 |    24 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ZZTEST_DESC |     1 |    24 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

create  index zztest_asc  on zztest (SEGMENT_NAME, EXTENT_ID,      FILE_ID,      BLOCK_ID)      tablespace data;
analyze index zztest_asc compute statistics;
drop    index zztest_desc; 
select extent_id, file_id, block_id from zztest where segment_name='OBJ$' and extent_id >12;

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     4 |    96 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ZZTEST_ASC |     4 |    96 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Funny: if you

analyze table zztest compute statistics for all columns;

the descending index is not considered any more.

Might need to do additional stuff with dbms_stats ...

As said, I'll play on with this on the weekend

Volker

volker_borowski2 · ‎2010 Oct 18

Hi again,

ok, it looks like I need to revise my thoughts about these decending indexes.

I do now think it has nothing to do with oracle doing diffrent kinds of ACCESS to these indexes, as I thought before.

I now think as Hermanns idea was before, the the success I had with my indexes was

accidently by getting a diffrent/better clustering due to the diffrent sorts of the columns.

Using the testcase above and this query:


select /*+ use_nl (a b)  */
a.segment_name,a.extent_id, a.file_id, a.block_id 
from zztest a, zztest b 
where a.segment_name='TESTCASE' 
  AND b.segment_name=a.segment_name
and a.bytes=8192
and a.extent_id > 50
and a.file_id   > 50
and a.block_id  > 50
and b.extent_id > 50
and b.file_id   > 50
and b.block_id  > 50
;

with diffrent sets of datadistribution of 60 records like


insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,27,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,28,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,29,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,30,9999,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,14,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,13,8192 );
insert into zztest (SEGMENT_NAME,EXTENT_ID,FILE_ID,BLOCK_ID,BYTES) values ( 'TESTCASE',9999,9999,9999,8192 );

I toggled between both indexes in place to watch the amount of GETs.

In 90% of the datadistribution I have chosen, the result was 7 GETs for both index-access way.

I could compile only one set, where the asc index had 7 and the desc had 9.

Now that was not the result I liked to show (it would just have been better the other way round for my thesis ,

but it showed, that the datadistribution could lead to data being differently arranged in the index

so that there is a difference.

I can give you the set of original columns from the table involved. They were like:


/BIC/AO_VER_VF /BIC/AO_SYS_VF        /BIC/AO_VAL_VF /BIC/AO_VER_VT /BIC/AO_SYS_VT        /BIC/AO_VAL_VT
 150 ... 400 rows of these type
30.07.2010     201008131339597017430 01.01.1000     31.07.2010     201008301248551283450 30.07.2010    
30.07.2010     201008131339597017430 30.07.2010     31.07.2010     201008301248551283450 31.12.9999    
30.07.2010     201008301248551283450 01.01.1000     31.07.2010     999912310000000000000 30.07.2010    
30.07.2010     201008301248551283450 30.07.2010     31.07.2010     999912310000000000000 31.12.9999    
31.07.2010     201008020916377367780 01.01.1000     31.12.9999     999912310000000000000 31.07.2010    
31.07.2010     201008020916377367780 31.07.2010     31.12.9999     999912310000000000000 31.12.9999    <<< This is the guy we like to have

typical join (BW-ODS)

WHERE
:
:  some key equi stuff here
:
AND T_00 . "/BIC/AO_VER_VF" <= '20100930' 
AND T_00 . "/BIC/AO_SYS_VF" <= '201010130811010000000' 
AND T_00 . "/BIC/AO_VAL_VF" <= '20100930' 
AND T_00 . "/BIC/AO_VER_VT" > '20100930' 
AND T_00 . "/BIC/AO_SYS_VT" > '201010130811010000000' 
AND T_00 . "/BIC/AO_VAL_VT" > '20100930' 
:
: Some T_00 Filters here
:
AND T_02 . "/BIC/AO_VER_VF" <= '20100930' 
AND T_02 . "/BIC/AO_SYS_VF" <= '201010130811010000000' 
AND T_02 . "/BIC/AO_VAL_VF" <= '20100930' 
AND T_02 . "/BIC/AO_VER_VT" > '20100930' 
AND T_02 . "/BIC/AO_SYS_VT" > '201010130811010000000' 
AND T_02 . "/BIC/AO_VAL_VT" > '20100930' 
AND T_02 . "/BIC/AO_PACKNO" BETWEEN '108981' AND '109021' 
:
: Some T_02 Filters here
:
AND T_01 . "/BIC/AO_VER_VF" <= '20100930' 
AND T_01 . "/BIC/AO_SYS_VF" <= '201010130811010000000' 
AND T_01 . "/BIC/AO_VER_VT" > '20100930' 
AND T_01 . "/BIC/AO_SYS_VT" > '201010130811010000000' 
:
: Some T_01 Filters here
: See, T_01 has no VAL_xx fields
:
AND T_03 . "/BIC/AO_VER_VF" <= '20100930' 
AND T_03 . "/BIC/AO_SYS_VF" <= '201010130811010000000' 
AND T_03 . "/BIC/AO_VAL_VF" <= '20100930' 
AND T_03 . "/BIC/AO_VER_VT" > '20100930' 
AND T_03 . "/BIC/AO_SYS_VT" > '201010130811010000000' 
AND T_03 . "/BIC/AO_VAL_VT" > '20100930'
:
: Some T_03 Filters here

So 6 CHAR DATE fields with two (_SYS_) of them being extremly lengthy, which could cause quite a couple of difference in clustering.

So unfortunately no proof

Volker

HermannGahm · ‎2010 Oct 18

Hi Volker,

>


> So unfortunately no proof
> Volker

thank you very much for your effort anyway.

The conversation here made at least some of us think about the details of

how data is accessed which is imho a good thing.

I can imagine that you got godd effects either by accident (clustering / sorting) or

in an earlier release. Things are changing so quickly in the database world (Database

and default configuration from SAP). Once you make a statement about a detail it will

soon not be true anymore or not precise

enough or ... . I try to stay up to date with the details that are important for me on the

database plattforms that SAP supports and it happens regulary that i find out that

the "things i know" ... have changed over time.

Kind regards,

Hermann

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Indexes making program run slower...