cancel
Showing results for 
Search instead for 
Did you mean: 

How do I determine if two different ORDER BY clauses yield rows in the same order?

Breck_Carter
Participant
3,972

For performance purposes (index usage) I am considering changing several queries from

SELECT ... ORDER BY table_name.column_name_1;

to

SELECT ... ORDER BY table_name.column_name_2;

The columnn_name_1 and 2 values in each result set are unique so the ORDER BY clauses are deterministic.

My first question is, "Are the two result sets returned in the same order?"

If the answer is "no" then the next question is, "Which rows are returned in different orders?"

The answer to the second question may help me figure out if the data can be "fixed" so the answer to question 1 becomes "yes", and I can make the change without affecting application correctness.

Bottom line: How do I write queries to answer questions 1 and 2?

Accepted Solutions (0)

Answers (3)

Answers (3)

Breck_Carter
Participant

Here is a variation on the WINDOW ... ROWS BETWEEN solution where the rows are "partitioned" by a third column (column0 in the code below).

In this variation, rows need to be ordered within each partition, not globally. There are two partitions: column0 = 1 and 2.

CREATE TABLE tab (
   column0  INTEGER NOT NULL,
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 1, 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 2, 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 1, 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 2, 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 1, 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 2, 8, DATEADD ( DAY, 6, CURRENT DATE ) ); -- ok ( really, it's ok now :)
INSERT tab VALUES ( 1, 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

-- Solution 2 partitions by column0...

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( PARTITION BY column0
              ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
2,4,'2010-09-28','2010-09-27',-1

The previous solution doesn't look at column0 so it gives a false "hit":

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
3,4,'2010-09-29','2010-09-27',-2
7,8,'2010-10-03','2010-10-02',-1
Breck_Carter
Participant

Here's a weird query that answers the question, "Show me the top 100 consecutive row pairs of table tab, ordered by column1, where the values of column2 in the row pair are out of order, and order the result set by the out-of-orderedness (the magnitude of the error)."

At least, it seems to work...

CREATE TABLE tab (
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
7,8,'2010-10-03','2010-09-27',-6
3,4,'2010-09-29','2010-09-27',-2

Alas, in the real world table, there are LOTS of row pairs that are out-of-order on column 2.

Sigh.

The good news is, that WINDOW query ran in 39 seconds on a 1.2 million row table in an already-heavily-loaded SQL Anywhere 11 database... sometimes, it just takes your breath away, that a query so WEIRD can run so quickly.

Breck_Carter
Participant
0 Kudos

Ha, ha... in the real world, a (probable) hardware error caused out-of-order CURRENT TIMESTAMP values to be recorded. In other words, column2 is occasionally out of order but it should not be... it's data cleanup problem... otherwise the ORDER BY results are the same! Woohoo!

VolkerBarth
Contributor
0 Kudos

While I agree with Graeme that I somewhat don't get the point:

If you want to find out if both sort orders are identical then how about using the row_number function and then use an EXCEPT query to find out if there are any rows with different row_number values. That would mean they are sorted differently.

SELECT col1, col2, ..., coln, row_number() over (order by col1) as sortorder1
FROM table1
EXCEPT
SELECT col1, col2, ..., coln, row_number() over (order by col2) as sortorder2
FROM table1
ORDER BY WhatYouLike

If that query returns an empty resultset, the sort order is identical (well, not necessarily in general, but for this particular result set).

It might also be possible to use two row_number functions in one single select and then do the comparison in the WHERE clause, such as

[RE-EDITED BY VOLKER as a derived query]

SELECT * FROM
  (SELECT col1, col2, ..., coln,
    row_number() over (order by col1) as sortorder1,
    row_number() over (order by col2) as sortorder2
    FROM table1) S
WHERE sortorder1 <> sortorder2

Please note: I haven't tested any of these queries and can't say if they are efficient in any way, but it might lead you on the right track:)


From Breck: A good start (answers the first question), but alas, either too much information or none at all...

CREATE TABLE tab (
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

SELECT column1, column2, row_number() over (order by column1) as sortorder1
FROM tab
EXCEPT
SELECT column1, column2, row_number() over (order by column2) as sortorder2
FROM tab
ORDER BY column1;

column1,column2,sortorder1
2,'2010-09-28',2
3,'2010-09-29',3
4,'2010-09-27',4
5,'2010-10-01',5
6,'2010-10-02',6
7,'2010-10-03',7
8,'2010-09-27',8

SELECT column1, column2,
row_number() over (order by column1) as sortorder1,
row_number() over (order by column2) as sortorder2
FROM tab
WHERE sortorder1  sortorder2

Could not execute statement.
Window function used in predicate
SQLCODE=-964, ODBC 3 State="HY000"
Line 1, column 1

-- 2nd sample as corrected by Volker
SELECT * FROM
  (SELECT column1, column2,
    row_number() over (order by column1) as sortorder1,
    row_number() over (order by column2) as sortorder2
    FROM tab) S
WHERE sortorder1 <> sortorder2
ORDER BY column1

Leads to the same (and therefore verbose) output than the SELECT ... EXCEPT query.

Breck_Carter
Participant
0 Kudos

See my comment to Graeme about "the point"... it's an unnecessary restriction, IMO, only allowing one CLUSTERED index per table 🙂

VolkerBarth
Contributor
0 Kudos

@Breck: However, according to a recent article on Glenn's blog, the query optimizer is able to check any index's "CLUSTERED" characteristics (defined or not) and make use of "nearly clustered" indizes, too. Cf. http://iablog.sybase.com/paulley/2010/09/analyzing-clustered-indexes/

VolkerBarth
Contributor
0 Kudos

@Breck: Have no SA engine ready available at the moment, but I guess my 2nd example should run when turned inside a derived query, cf. http://sqlanywhere-forum.sap.com/questions/1015