on 2010 Sep 25 11:29 AM
For performance purposes (index usage) I am considering changing several queries from
SELECT ... ORDER BY table_name.column_name_1;
to
SELECT ... ORDER BY table_name.column_name_2;
The columnn_name_1 and 2 values in each result set are unique so the ORDER BY clauses are deterministic.
My first question is, "Are the two result sets returned in the same order?"
If the answer is "no" then the next question is, "Which rows are returned in different orders?"
The answer to the second question may help me figure out if the data can be "fixed" so the answer to question 1 becomes "yes", and I can make the change without affecting application correctness.
Bottom line: How do I write queries to answer questions 1 and 2?
Request clarification before answering.
Here is a variation on the WINDOW ... ROWS BETWEEN solution where the rows are "partitioned" by a third column (column0 in the code below).
In this variation, rows need to be ordered within each partition, not globally. There are two partitions: column0 = 1 and 2.
CREATE TABLE tab ( column0 INTEGER NOT NULL, column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 1, 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 2, 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 1, 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 2, 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 1, 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 2, 8, DATEADD ( DAY, 6, CURRENT DATE ) ); -- ok ( really, it's ok now :) INSERT tab VALUES ( 1, 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; -- Solution 2 partitions by column0... WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( PARTITION BY column0 ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 2,4,'2010-09-28','2010-09-27',-1
The previous solution doesn't look at column0 so it gives a false "hit":
WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 3,4,'2010-09-29','2010-09-27',-2 7,8,'2010-10-03','2010-10-02',-1
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Here's a weird query that answers the question, "Show me the top 100 consecutive row pairs of table tab, ordered by column1, where the values of column2 in the row pair are out of order, and order the result set by the out-of-orderedness (the magnitude of the error)."
At least, it seems to work...
CREATE TABLE tab ( column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 7,8,'2010-10-03','2010-09-27',-6 3,4,'2010-09-29','2010-09-27',-2
Alas, in the real world table, there are LOTS of row pairs that are out-of-order on column 2.
Sigh.
The good news is, that WINDOW query ran in 39 seconds on a 1.2 million row table in an already-heavily-loaded SQL Anywhere 11 database... sometimes, it just takes your breath away, that a query so WEIRD can run so quickly.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
While I agree with Graeme that I somewhat don't get the point:
If you want to find out if both sort orders are identical then how about using the row_number function and then use an EXCEPT query to find out if there are any rows with different row_number values. That would mean they are sorted differently.
SELECT col1, col2, ..., coln, row_number() over (order by col1) as sortorder1
FROM table1
EXCEPT
SELECT col1, col2, ..., coln, row_number() over (order by col2) as sortorder2
FROM table1
ORDER BY WhatYouLike
If that query returns an empty resultset, the sort order is identical (well, not necessarily in general, but for this particular result set).
It might also be possible to use two row_number functions in one single select and then do the comparison in the WHERE clause, such as
[RE-EDITED BY VOLKER as a derived query]
SELECT * FROM
(SELECT col1, col2, ..., coln,
row_number() over (order by col1) as sortorder1,
row_number() over (order by col2) as sortorder2
FROM table1) S
WHERE sortorder1 <> sortorder2
Please note: I haven't tested any of these queries and can't say if they are efficient in any way, but it might lead you on the right track:)
CREATE TABLE tab ( column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; SELECT column1, column2, row_number() over (order by column1) as sortorder1 FROM tab EXCEPT SELECT column1, column2, row_number() over (order by column2) as sortorder2 FROM tab ORDER BY column1; column1,column2,sortorder1 2,'2010-09-28',2 3,'2010-09-29',3 4,'2010-09-27',4 5,'2010-10-01',5 6,'2010-10-02',6 7,'2010-10-03',7 8,'2010-09-27',8 SELECT column1, column2, row_number() over (order by column1) as sortorder1, row_number() over (order by column2) as sortorder2 FROM tab WHERE sortorder1 sortorder2 Could not execute statement. Window function used in predicate SQLCODE=-964, ODBC 3 State="HY000" Line 1, column 1
-- 2nd sample as corrected by Volker
SELECT * FROM
(SELECT column1, column2,
row_number() over (order by column1) as sortorder1,
row_number() over (order by column2) as sortorder2
FROM tab) S
WHERE sortorder1 <> sortorder2
ORDER BY column1
Leads to the same (and therefore verbose) output than the SELECT ... EXCEPT query.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Breck: However, according to a recent article on Glenn's blog, the query optimizer is able to check any index's "CLUSTERED" characteristics (defined or not) and make use of "nearly clustered" indizes, too. Cf. http://iablog.sybase.com/paulley/2010/09/analyzing-clustered-indexes/
@Breck: Have no SA engine ready available at the moment, but I guess my 2nd example should run when turned inside a derived query, cf. http://sqlanywhere-forum.sap.com/questions/1015
User | Count |
---|---|
76 | |
30 | |
10 | |
8 | |
8 | |
7 | |
7 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.