on 2011 Jul 22 6:41 AM
Let's say I have a giant SELECT buried inside an application, and I just want to do a quick dbisql timing test without all the application complexity. Here's the pseudo code for an example
SELECT blah blah blah FROM blah blah blah WHERE blah blah blah AND SO ON;
All else being equal (same predicates, same host variable values, same everything else except the SELECT list), can I rely on the following being just as slow as the original?
Let's further assume there is no funkiness in the SELECT list, no user-defined function calls that do RECURSIVE CROSS JOIN UPDATEs between DBA.Mars and DBA.TheMoon, none of that stuff, just ordinary columns.
SELECT COUNT(*) FROM blah blah blah WHERE blah blah blah AND SO ON;
Request clarification before answering.
There are a number of reasons the COUNT(*)
variant can be faster than the original (I wouldn't expect it to be slower). The query does not need all of the columns that the original query did, so the optimizer can eliminate tables (Elimination of unnecessary inner and outer joins). Further, table columns are not needed in the select list and this may allow a table to be accessed using index-only retrieval where this might not have been possible in the original query. Even if neither of these changes occur, any materializing operators may have a reduced data size because they don't need to carry up columns that are used only in the select list.
I know that you suggested there are no UDFs in the SELECT list (those would not need to be evaluated in the COUNT(*)
variant), but even builtin functions can take time to evaluate.
There are a couple of techniques that I use (with their own limitations). You can use a row-limitation clause such as "TOP 1 START AT 1000000000"
. This won't return any rows to the client, but it does have the same intermediate data characteristics (materializing operators will copy base table columns needed in the original). However, this approach does not evaluate the SELECT list expressions, and it also may alter the plan selection. Another approach is to rely on the GRAPHICAL_PLAN()
builtin function with a statistics-level of 3 (or use dbisql's plan viewer and "Detailed statistics". This adds a small amount of per-row cost at the root of the query, but it does not change the plan. All SELECT list expressions are evaluated and the plan is unmodified from the original. Care is needed, though, to match the characteristics of the original query (cursor type, etc.) and this approach does not work with host variables in the statement (they can be replaced by connection-level variables if necessary).
Another approach I have recently become fond of is to use the sa_copy_cursor_to_temp_table()
system procedure, setting first-row to 9223372036854775807 so that no rows are actually copied. The advantage of this approach is that it can be used with any statement where you can open a cursor, and it builds the cursor "in-situ".
Comparing the techniques, the TOP-n/START AT approach is closest to the COUNT(*)
one. You don't need to turn the statement into a string (escaping ') and it can work with host variables. It is probably closer in performance to the original than the COUNT(*)
approach. The GRAPHICAL_PLAN() is probably more accurate, but it may require more setup (convert to string, convert host variables to connection variables). The sa_copy_cursor_to_temp_table()
approach works well for timing in-situ but may not be the simple approach you want to reach for when first analysing a query for performance.
All of these approaches intentionally eliminate the network time needed to send the result back. In the cases where that is pertinent, the fetchtst tool is a great way to test the performance of a query; it allows you to control the cursor type, it supports host variables, and it can simulate different ways of fetching the result set. It can also be combined with the above described techniques.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You should write a book. I know, you have real work to do, but still, you should do it.
Thank you @Breck and @Volker; I appreciate your kind comments (and Breck's earlier suggestion that I start a blog). I do have a blog (actually two) that are both for limited consumption. I haven't found that I update either with any appropriate frequency. My non-work blog was last updated six years ago, my "work internal" blog over 2 years ago. If I could achieve a better frequency, I'd consider having a public work-related blog.
User | Count |
---|---|
52 | |
10 | |
9 | |
8 | |
5 | |
5 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.