on 2023 Aug 21 1:28 PM
Hi, we talk about SAP - Sybase ASE, I have discussed this topic with many people around the world and I have not found anyone who knows how to give me a short and concise explanation on this topic. Let's see if there is an expert out there who can give me a hand.
I am aware that running "update index statistics" can:
1) Fill the tempdb database being used.
2) Actually, what this command does from behind is "update statistics <table> <field>" for each field of the index, doing sorts, locks and scans at the index level and at the field level on the table.
3) Execution time.
Now, I understand that if I execute "update index statistics" a histogram is generated for each field of the index which helps to define the cost of the index with most precision and the optimizer to choose it. But what happens if, for example, of three fields that have an index, the middle field is not informed as a predicate of the "where" clause?, I understand that the optimizer must do the necessary operations first to identify the uniqueness, if it exists , define the cost and then to see if it is the most optimal index. That is, using "update index statistics" seems more efficient than using "update statistics", but if this is the case, why doesn't the "update statistics" go away?.
Honestly, I'm lost, is there anyone out there to help me see the light at the end of the tunnel, I have a feeling this is the same as "which came first the chicken or the egg?"
Regards and thanks in advance
Javi.-
If you said "in some cases there will be no improvement in performance by running update index statistics compared to update statistics, but this is highly dependent on a specific combination of circumstances unlikely to apply in most cases to most databases", I think we can finally agree.
That's why both SAP and I say it is good practice to always run update index statistics instead of update statistics.
If there is not a large enough operational window to run update index statistics then running update statistics instead is probably better than running nothing at all... but so much can be done to make update statistics (all versions) run faster and with fewer resources that it is rare (maybe impossible?) that it simply cannot be run at all.
But as you say, if you are happy with everything as it is, and your SQL doesn't ever change, and your data doesn't ever change, you don't need to change what you're doing. As soon as anything changes that may no longer be true. I remind you again that this behaviour is not linear - your data growing or changing by 1% may be enough to flip a good query plan into a bad query plan that runs 10x, 100x, or maybe even 1000x more slowly. I think very few sites can guarantee that nothing ever changes.
For everyone else: update index statistics is the way to bet.
I think you are trying to find a way to justify not running it? If you don't want to run it, don't run it, and hope that no-one cares if performance suddenly gets worse one day.
The reason why everyone writing about this says update index statistics is better is because it is better. 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Joe, that's where I wanted to get to, if "update index statistics" is better than "update statistics" why doesn't the latter disappear? This is what I mentioned at the beginning of this chapter.
In this URL there are many comments from people who work at SAP - Sybase, also throughout www, but no one gets involved, curious fact. Let's see if at some point someone gets a document signed by SAP - Sybase to talk about one vs the other, I'm afraid this is not possible.
I'm not looking for a way or excuse not to run "update index statistics", I'm looking for a reason to stop using "update statistics".
Regards
No, that's not what I'm saying.
There is only one time when "update index statistics" will generate identical statistics to "update statistics": when all indexes index only one column at a time.
In all other cases "update index statistics" will generate more information than "update statistics".
ASE will always consider all available statistics when making optimiser decisions about table scan vs. index scan, join type, join order, parallel vs serial, LRU vs MRU replacement startegy, how to handle OR clauses, how to handle UNIONs, how to handle aggregates, and many other decisions besides.
If statistics are not available then ASE uses hard-coded magic numbers as mentioned previously, where it assumes SARGs (search arguments) match 10%, 25%, or 33% of the total rows in a table.
So put all of that together: the only time when it does not matter if we run "update index statistics" vs. "update statistics" is if the number of rows matching any SARG are in fact exactly 10%, 25%, or 33% of the total rows in the table.
This has nothing to do with "if the queries on a table vary their execution plan depending on the arguments passed to them". I never mentioned that anywhere. I don't know where you have pulled that wrong but it's not true and it's not what I said.
To be very clear: you could have queries whose execution plans do not vary depending on the arguments passed to them, but do vary depending on whether you have run "update statistics" or "update index statistics".
The quality and amount of statistics is one of the single most important components to good ASE performance. Don't pretend otherwise. So long as it can fit in operational windows it is always best practice to run "update index statistics" vs. "update statistics". This is in fact a question I ask when I am interviewing people for technical Sybase ASE roles, and anyone who thinks there is no difference is not someone I would ever hire.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Joe, maybe I haven't explained myself well. Let's see if it is possible now..
I focus on indexes that have several fields, those that have a single field do not apply, as you mention.
We agree that using "update index statistics" generates more information than using "update statistics", but this does not mean that we have better performance in an application.
Always generalizing, if we assume that one executes "update statistics" normally, because your application has good performance, and a query appears on a specific table, which does not have the required plan, you have the following alternatives: "update index statitics", modify histogram, modify densities, of course everything requires study and testing.
I know that you have not said that "the queries on a table vary their execution plan depending on the arguments passed to them", I have said it, and this dependent on cardinality and depending the SARG arguments: =, >, <, >=, <=, BETWEEN y some time LIKE.
"update index statistics" is better than "update statistics", if I have a good performance in my application I don't see the need to use the first one. That it is better in some specific case, correct.
Regards
If I run "update index statistics" it will not change the execution plan, what do I gain if I run it?
If it doesn't change the query plan then nothing is gained.
I'm saying that it often (usually) does change the query plan, because performance effects are not linear.
Adding 1% more rows to a table might not mean that performance is 1% worse but that performance is 300% worse.
Adding 1% more unique values to a column might not mean that performance is 1% worse but that performance is 100x worse.
I've seen both of those in real production systems. These aren't made-up examples.
I can run "update index statistics" on a table that depending on the data / queries,
the execution plan varies, but if it does not vary, what does its execution bring us?
If nothing changes in the execution, then nothing is gained by the better statistics.
If you wear a seat belt and you don't have a crash, does that mean that nothing was gained by wearing a seal belt? Does that mean you don't need to wear a seat belt any more?
Here is the thing I think you are missing. You seem to me to be asking for certainty, but this is not an area where things are certain. This is an area where the computer must constantly make guesses, and the cost of getting a guess wrong is not small. A bad query plan is not usually a little bit worse. It is usually ten times, maybe one hundred times worse.
For most people the incremental costs of better statistics is small over the cost of basic statistics. This is again not a matter of certainty. It is a gamble. Will I pay 20% more time for my update stats to reduce the chance of a query running 100x or even just 3x slower? Yes, most times I think that is a good gamble, especially as there are many tricks I can use to reduce the costs of update statistics.
If you're saying your application only ever runs exactly the same SQL against exactly the same data, nothing ever changes, and you don't need to run update index statistics - congratulations! Don't run update index statistics.
But your doubt seems to me be to question whether better stats always means better performance. I have already confirmed, more than once, that no, it is not 100% certain to lead to better performance. I instead ask you: how much will you pay as insurance to have the best possible chance of the best possible performance? In production? In systems live in front of customers, or used to meet regulatory requirements?
I think I am willing to pay quite a bit for insurance, because when you need it is when you really want it... and is too late to go buy it. If I am having a performance problem with bad query plans due to bad or missing stats, I probably don't have time to run my update stats now... and I will be kicking myself that I didn't run them last weekend.
I'm no longer sure what you're looking for... it's like someone told you once "update index stats is always better than update stats", and you think "that cannot be 100% true and I want someone to admit it". I have admitted it. So what? Go run update index stats anyway. 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Joe, with all the information we have exchanged, it is clear to me. "update index statistics", generalizing, is one more option, but it is not better than "update statistics", as long as the execution plans do not change by executing the latter. If you can run "update index statistics" then all the better, but it is not necessary for what I am saying.
Perhaps it has led me to confusion that in many places, documents on the internet of SAP - Sybase and not SAP - Sybase, I have read that you have to run "update index statistics" and I have never understood why. Now I see that they should put the label "if the queries on a table vary their execution plan depending on the arguments passed to them" and taking into mind that there is also the possibility to modify the number of steps of the histogram or modify the statistics, always research the best option.
If you don't mind, I'll keep your contact in case we can talk about any other subject.
Best regards and thanks.
Also for bonus points, note that we could have achieved almost the same stats with just one table scan instead of three index scans + four table scans by running
update statistics tbl (A, B, C, D)
I say "almost" because we would be missing some table-level information. We could add that by adding
update table statistics tbl
This requires a second table scan, but two table scans is better than four, and better than four plus three index scans.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Ah but the different forms of "update statistics" do indeed effectively call each other.
As an example let's have a table with five columns (A, B, C, D, E), and three indexes:
idx1 (A, B)
idx2 (A, C)
idx3 (B, C, D)
If we run the basic "update statistics" on the table with no arguments this is the same as running
update statistics tbl (A)
update statistics tbl (A)
update statistics tbl (B)
In other words, we run column-level statistics for every column that is the leading column of an index. Note that we run it twice for column A!
If we instead run "update index statistics" on the table with no arguments this is the same as running
update statistics tbl (A)
update statistics tbl (B)
update statistics tbl (A)
update statistics tbl (C)
update statistics tbl (B)
update statistics tbl (C)
update statistics tbl (D)
In other words we run column-level update statistics on every indexed column. Note that we repeat this for columns A, B, and C.
It's actually slightly more complicated than this, but this is enough to make the point that every other version of update statistics at its heart makes one or more calls to the "simpler" form of update statistics.
Your main question seems to be doubt that fewer stats always means worse performance, and better/more stats always means better performance. And of course no there are no guarantees. To better understand this remember what's going on under the hood: the ASE optimiser is trying to guess which of every possible query plan will be the fastest, which almost always means which needs the fewest I/Os.
So how does ASE know how many I/Os will be used for each decision it makes in a query plan? (Which index or table scan; what join order; what join type; and many other decisions.) Every estimate by the optimiser depends on statistics.
If we simply don't have statistics on a particular column - perhaps because we have only ever run regular update statistics and not update index statistics, and this is an indexed column but not the leading column - then ASE uses some hardcoded numbers instead of stats:
- assume any "=" WHERE clause matches 10% of the rows in the table
- assume any ">" or "<" WHERE clause matches 25% of the rows in the table
- assume any ">=" or "<=" WHERE clause matches 33% of the rows in the table
These hardcoded magic numbers might be radically different to the true numbers, but without any stats that's all the ASE can use. Using those numbers might not result in a worse query plan with worse performance: if for example there are no indexes at all and only one table in the query then it will make no difference because there is only one possible join order, no join type, and it will table scan. But for most queries it matters a lot.
Fortunately you don't have to take my word for it. Find a reasonably complex query and test this, first by running "delete statistics" followed by "update statistics", then by running "delete statistics" followed by "update index statistics". It's easy to construct scenarios where this doesn't matter... but it's also easy to show many and maybe even most queries will do better.
You can cross-check this by enabling "show missing stats", where the optimiser will tell you plainly that it might have been able to do better if it had had more stats than you gave it.
You seem like you want a generalised answer, and this is it: most of the time "update index stats" gives more and better information to the ASE optimiser, which increases the chances of it finding a better query plan. On very rare occasions we might hit an optimiser bug where the extra information lead to a worse query plan. I have seen this a handful of times in 20+ years of working with ASE, but I have seen "update index stats" improving application performance probably over 3000 times. I think those are pretty good odds.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Joe, reading you I think you understand my doubts.
allows me to give a twist to the topic. For example, a pure OLTP, a CRM, we have a correct performance with "update statistics". If I run "update index statistics" it will not change the execution plan, what do I gain if I run it?
I can run "update index statistics" on a table that depending on the data / queries, the execution plan varies, but if it does not vary, what does its execution bring us?
I work with OLTPs and DSSs and many times I have had problems with some queries, because their execution plan was not the correct one, not being able to modify the queries I have achieved what I wanted by modifying the statistics and/or the number of steps of the histogram, but never executing the statistics on all the fields of the index, it has not been necessary for me to get to that.
Many times we have to deal with the application provider's indexes, sometimes they do not let us touch them, but it is true that having the correct indexes, or close to being correct, is a great advantage.
Regards and thank you very much for your time
The question as asked was
That is, using "update index statistics" seems more efficient than using "update statistics", but if this is the case, why doesn't the "update statistics" go away?.
I can't speak for why SAP Product Management have not deprecated the plain form of "update statistics", but here are my guesses. They are only guesses.
1) It came first. Commands and features are rarely taken away just because newer forms are introduced. This is like saying why do "reorg reclaim_space", "reorg forwarded_rows", and "reorg compact" still exist when "reorg rebuild" is available? For that matter why still have "reorg rebuild" now that "reorg defrag" is available? Because it's rare to take old versions away.
2) Maybe some people don't want to run the newer form. Maybe someone doesn't need "update index stats" and don't want the extra overhead. Maybe someone is being clever and running a single "update stats [table] (columnA, columnB, columnC, columnD)" rather than "update index stats" which as you say under the hood runs multiple column scans.
3) Maybe an application vendor strictly instructs their users to run only "update statistics". They might be wrong, but users might not wish to lose vendor support.
I think instead we should say it's great that we have so many different forms of these commands to run, so we can have very fine control over what it is we do at our sites, rather than SAP telling us it must be one way only.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Good morning Joe, it is correct that some reorg commands do the work of other reorg commands including their own, but each reorg command is intended for certain scenarios. This is not the case with statistics, or at least that's how I interpret it.
But my doubt goes a little further, I have no problem in executing "update statistics" or "update index statistics", but apart from being more complete the second one I would like to know if even being more complete, the performance of an application, generalizing, is better or is the same, that's where my dilemma lies.
Imagine that a SAP - Sybase performance engineer, the one who programs the code, says in this thread: indeed, running "update index statistics" is more complete and bla bla bla, but it doesn't make the application perform better overall. Or imagine I have a system where I run "update statistics" and I have good performance, but I run "update index statístics" and the thing changes. This is where I want to get to.
Possibly the thing is as simple as you say, but I need more arguments to understand it.
Regards and thanks a lot for your interest and participation.
Hi Javi,
The basic idea behind wanting better ASE statistics is to give the ASE optimizer the best possible chance to generate the fastest possible query plan.
The choice of which access method (table scan vs index) and which join order depend on knowing how many I/Os will ASE need to do if it picks those rather than any of the other options. The stats are what helps ASE estimate how many rows and pages will qualify for every predicate in a query.
If the stats are stale then ASE doesn't have access to the best information. If the stats are missing then ASE can only guess, and the hardcoded magic numbers it uses for guesses can sometimes produce terrible query plans.
So it's actually nothing to do with whether the indexes on an object are correct or not. It's about ASE having the information it needs about how many rows and pages will match each of the decisions it must choose between.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
As I understand it:
For non-leading index columns which need data distribution stats, if the ordinality is low (number of unique values is low), hashing can use a lot less space than sorting (which might speed things up). You can also using the sampling option to reduce the amount of data read, and therefore reduce the amount of sort/hashing required.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Ben, thanks for you comments.
Let's see if someone can give us a hand and help us. In many sites I read, use "update index statistics" but it doesn't say anything about "update statistics", neither I read in any site that the applications have better performance using "update index statistics". According to what I say in my statement, if it is correct, I could understand that the optimizer has gained something by using "update index statistics" but that does not mean that the application has better performance. We always start from the premise that the objects have correct indexes.
Regards
Javi.-
User | Count |
---|---|
67 | |
8 | |
8 | |
6 | |
6 | |
6 | |
6 | |
6 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.