on 2011 Jul 22 12:40 PM
I have a query involving 4 tables. 3 have a foreign key relation. The optimizer estimates differ for all tables from the actual values.
Estimate Actual Rows returned Table A 2.8 1 Table B 4600 135000 Table C 1 135000 Table D 17700 255
The tables are joined from top to bottom. My desired plan would be the otherway round because I know, that only few rows in Table D will match.
A similar query for the same 4 tables results in:
Estimate Actual Rows returned Table D 490 290 Table C 3 1091 Table B 0.2 272 Table A 2.8 1
Both statements have the same number of search criteria one on Table A and one Table D. It only depends on the change of the value which is searched for in Table D. Difference in runtime between plan 1 and plan 2 is 3.5 to 0.01 seconds.
After some more tests it seems, that as soon as the estimate for matching rows in Table D falls below 1000 the better plan is used. A user estimate on the criteria for Table D is ignored by the optimizer, I wonder why? For the other criteria it is accepted.
What are your suggestions how to help the optimizer in such a situation?
Request clarification before answering.
Normally the database server maintains the selectivity estimates for the data/columns in each table automatically but they can get out of sync under some scenarios. When this happens running CREATE STATISTICS for each table/column may help.
Example:
create statistics A ( colA ); create statistics B ( colB ); create statistics C ( colC ); create statistics D ( colD );
where colA, colB, colC, colD are the columns from the respective tables that you are wanting to improve the selectivity estimates.
Read more about CREATE STATISTICS in the docs.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You shouldn't have to do it anytime, but some users do sequence(s) of operations that cause the statistics to get out of sync with reality. Most users never need to regenerate the statistics; at least one user has to do it every day (due to their particular insert/update/delete loads that cause the stats to be skewed). Improvements have also been made over the years to the code that keeps the histogram statistics.... so there isn't a "one answer fits all" answer. You need to make the determination if you need to recreate statistics, and if you do which columns stats need to be adjusted and what frequency it needs to be done. FWIW: You can use the dbhist tool to check the existing statistics for a column.
User | Count |
---|---|
46 | |
6 | |
6 | |
5 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.