on 2012 Oct 05 10:26 AM
Statistics that are generated in the table syscolstat are really needed for the database?
I'm having problems because of low performance statistics. I periodically drop statistics and improves performance significantly. Is there any possibility to not generate them?
Thanks, Leonardo.
Request clarification before answering.
Statistics that are generated in the table syscolstat are really needed for the database?
Yes, very much so. Collecting a statistical distribution of values helps provide the database optimizer a (rough) estimate of the amount of work it will need to perform to scan for sargable predicates in an index and also helps with join strategy estimates inside the optimizer.
Is there any possibility to not generate them?
You can use ALTER STATISTICS to control the collection of statistics on particular columns/tables. Generally it is not recommended to turn this feature off.
Statistics are automatically maintained by the database engine as SELECT/INSERT/UPDATE/DELETE operations are executed. Generally, statistics can easily become 'incorrect' over time due to two major reasons:
The 'buckets/bins/discrete ranges' picked for the histogram ranges no longer reflect a "balance" of values across the table. As a result of this "unbalanced" histogram range division, values are not as unique in each bucket (many or most values are stored in one or very few buckets), and the optimizer has to scan through more values found in the "full" bucket (resulting in low index selectivity in the optimizer plan).
The 'selectivity' values stored in the ranges are incorrect (i.e. the number of values that really appear in the table does not reflect the percentage of values stored in the histogram bucket range).
For older versions of SQL Anywhere, the only way to correct this information is to manually issue the CREATE STATISTICS command when you know the information is incorrect. (Either slowly over time, or perhaps after large data operations).
You can check what the current values are in the histogram data for your table by using the dbhist utility or the sa_get_histogram() stored procedure.
The new statistics governor in SQL Anywhere 12 is intended to help "automate" the check for these two issues - when it detects an issue it will try to issue a "CREATE STATISTICS" statement for you, as needed. Since this feature did not exist in older versions, it may be required to review the statistical data as part of your migration strategy.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I'm not 100% sure about your question, but if you're implying that the above described behaviour is possibly a "bug", I would instead suggest this is more of a known product limitation that we're always trying to improve upon.
As much as we try to mathematically keep track of accurate information, as I mentioned, over time this can become "skewed" (due to rounding issues of the actual percentage values themselves, plus the fact that we do not automatically re-divide up the histogram ranges without CREATE STATISTICS being run).
Since we don't know that the statistical information is really different than what appears for all of the values stored in the table (until a manual SQL request comes in to re-evaluate the statistics), the database doesn't have a chance to correct itself.
The newer versions of SQL Anywhere now try to limit this behaviour by doing this dynamic monitoring of the statistics for you rather than assuming that the user will be constantly monitoring the statistics themselves and adjusting as necessary manually (older SQL Anywhere versions).
Aside: "Normalizing a database" generally implies something about the way entities are arranged in a schema (from a theoretical perspective) and is not related to database statistics (used for optimization at runtime).
Hi, @Jeff Albion I do not mean to imply anything, just had a little doubt about "whether normalization of a database" could influence your performance with statistics. Now it's clearer. Sorry. Thank you!
Leonardo,
Please post the plan files (or sent them to me directly at anisoara.nica@sap.com): in dbisql: 1. call sa_flush_cache() before each execution 2. in the 'Plan Viewer', set the option 'Detailed and node statisitcs' 3. 'Get Plan' 4. save the plan in a file using 'Save As...' 5. repeat 1-4 for each combination {without drop stats, drop stats} X {with order by, without order by}
Thanks Ani
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
These are the results:
Before drop statistics without order by clause
Before drop statistics with order by clause
After drop statistics without order by clause
After drop statistics with order by clause
In my previous tests I was not running the command: call sa_flush_cache()
so the result was so diverse. I do not understand why the return of select
is so slow.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Can you please email or attach the query plan as per the instructions. The screen caps and PDFs do not contain all of the information nor provides an easy mechanism to analyze the issue. To get the plan that we require,
Please ensure that sa_flush_cache() is called for each iteration as noted in Ani's previous post.
User | Count |
---|---|
87 | |
9 | |
9 | |
8 | |
6 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.