cancel
Showing results for 
Search instead for 
Did you mean: 

How can I help the optimizer estimating better?

MCMartin
Participant
0 Kudos
2,046

I have a query involving 4 tables. 3 have a foreign key relation. The optimizer estimates differ for all tables from the actual values.

          Estimate      Actual Rows returned 
Table A       2.8              1
Table B      4600           135000
Table C        1            135000
Table D     17700             255

The tables are joined from top to bottom. My desired plan would be the otherway round because I know, that only few rows in Table D will match.

A similar query for the same 4 tables results in:

           Estimate      Actual Rows returned 
Table D       490              290
Table C        3              1091
Table B        0.2             272
Table A       2.8              1

Both statements have the same number of search criteria one on Table A and one Table D. It only depends on the change of the value which is searched for in Table D. Difference in runtime between plan 1 and plan 2 is 3.5 to 0.01 seconds.

After some more tests it seems, that as soon as the estimate for matching rows in Table D falls below 1000 the better plan is used. A user estimate on the criteria for Table D is ignored by the optimizer, I wonder why? For the other criteria it is accepted.

What are your suggestions how to help the optimizer in such a situation?

Accepted Solutions (0)

Answers (1)

Answers (1)

MarkCulp
Participant

Normally the database server maintains the selectivity estimates for the data/columns in each table automatically but they can get out of sync under some scenarios. When this happens running CREATE STATISTICS for each table/column may help.

Example:

create statistics A ( colA );
create statistics B ( colB );
create statistics C ( colC );
create statistics D ( colD );

where colA, colB, colC, colD are the columns from the respective tables that you are wanting to improve the selectivity estimates.

Read more about CREATE STATISTICS in the docs.

MCMartin
Participant
0 Kudos

Any recommendation how regular this should be done? Should it be part of a reqular maintainance?

MarkCulp
Participant

You shouldn't have to do it anytime, but some users do sequence(s) of operations that cause the statistics to get out of sync with reality. Most users never need to regenerate the statistics; at least one user has to do it every day (due to their particular insert/update/delete loads that cause the stats to be skewed). Improvements have also been made over the years to the code that keeps the histogram statistics.... so there isn't a "one answer fits all" answer. You need to make the determination if you need to recreate statistics, and if you do which columns stats need to be adjusted and what frequency it needs to be done. FWIW: You can use the dbhist tool to check the existing statistics for a column.