cancel
Showing results for 
Search instead for 
Did you mean: 

Low performance with wrong statistics

Former Member
6,393

Statistics that are generated in the table syscolstat are really needed for the database?

I'm having problems because of low performance statistics. I periodically drop statistics and improves performance significantly. Is there any possibility to not generate them?

Thanks, Leonardo.

Accepted Solutions (1)

Accepted Solutions (1)

jeff_albion
Product and Topic Expert
Product and Topic Expert

Statistics that are generated in the table syscolstat are really needed for the database?

Yes, very much so. Collecting a statistical distribution of values helps provide the database optimizer a (rough) estimate of the amount of work it will need to perform to scan for sargable predicates in an index and also helps with join strategy estimates inside the optimizer.

Is there any possibility to not generate them?

You can use ALTER STATISTICS to control the collection of statistics on particular columns/tables. Generally it is not recommended to turn this feature off.

Statistics are automatically maintained by the database engine as SELECT/INSERT/UPDATE/DELETE operations are executed. Generally, statistics can easily become 'incorrect' over time due to two major reasons:

  1. The 'buckets/bins/discrete ranges' picked for the histogram ranges no longer reflect a "balance" of values across the table. As a result of this "unbalanced" histogram range division, values are not as unique in each bucket (many or most values are stored in one or very few buckets), and the optimizer has to scan through more values found in the "full" bucket (resulting in low index selectivity in the optimizer plan).

  2. The 'selectivity' values stored in the ranges are incorrect (i.e. the number of values that really appear in the table does not reflect the percentage of values stored in the histogram bucket range).

For older versions of SQL Anywhere, the only way to correct this information is to manually issue the CREATE STATISTICS command when you know the information is incorrect. (Either slowly over time, or perhaps after large data operations).

You can check what the current values are in the histogram data for your table by using the dbhist utility or the sa_get_histogram() stored procedure.


The new statistics governor in SQL Anywhere 12 is intended to help "automate" the check for these two issues - when it detects an issue it will try to issue a "CREATE STATISTICS" statement for you, as needed. Since this feature did not exist in older versions, it may be required to review the statistical data as part of your migration strategy.

0 Kudos

Jeff Albion, If the statistics generated are used just to improve the performance of the database, "because" this happens misconduct? Incorrect normalization of the database?

jeff_albion
Product and Topic Expert
Product and Topic Expert

I'm not 100% sure about your question, but if you're implying that the above described behaviour is possibly a "bug", I would instead suggest this is more of a known product limitation that we're always trying to improve upon.

As much as we try to mathematically keep track of accurate information, as I mentioned, over time this can become "skewed" (due to rounding issues of the actual percentage values themselves, plus the fact that we do not automatically re-divide up the histogram ranges without CREATE STATISTICS being run).

Since we don't know that the statistical information is really different than what appears for all of the values stored in the table (until a manual SQL request comes in to re-evaluate the statistics), the database doesn't have a chance to correct itself.

The newer versions of SQL Anywhere now try to limit this behaviour by doing this dynamic monitoring of the statistics for you rather than assuming that the user will be constantly monitoring the statistics themselves and adjusting as necessary manually (older SQL Anywhere versions).


Aside: "Normalizing a database" generally implies something about the way entities are arranged in a schema (from a theoretical perspective) and is not related to database statistics (used for optimization at runtime).

0 Kudos

Hi, @Jeff Albion I do not mean to imply anything, just had a little doubt about "whether normalization of a database" could influence your performance with statistics. Now it's clearer. Sorry. Thank you!

Answers (2)

Answers (2)

Former Member

Leonardo,

Please post the plan files (or sent them to me directly at anisoara.nica@sap.com): in dbisql: 1. call sa_flush_cache() before each execution 2. in the 'Plan Viewer', set the option 'Detailed and node statisitcs' 3. 'Get Plan' 4. save the plan in a file using 'Save As...' 5. repeat 1-4 for each combination {without drop stats, drop stats} X {with order by, without order by}

Thanks Ani

Former Member
0 Kudos

These are the results:

Before drop statistics without order by clause

Before drop statistics with order by clause

After drop statistics without order by clause

After drop statistics with order by clause

In my previous tests I was not running the command: call sa_flush_cache() so the result was so diverse. I do not understand why the return of select is so slow.

chris_keating
Product and Topic Expert
Product and Topic Expert
0 Kudos

Can you please email or attach the query plan as per the instructions. The screen caps and PDFs do not contain all of the information nor provides an easy mechanism to analyze the issue. To get the plan that we require,

  1. Go to Tools | Plan Viewer or Shift-F5
  2. Set the Statistics Level drop down to "Detailed and node statistics"
  3. Click Get Plan button
  4. Click Save As ... button and select a filename that identifies the plan iteration
  5. Post or email the .saplan file that is generated from each iteration.

Please ensure that sa_flush_cache() is called for each iteration as noted in Ani's previous post.