on 2016 Jan 27 9:49 AM
The 16.0.0.2127 query in question takes over 2 seconds to return the expected result of zero rows, but with an additional redundant BETWEEN predicate the response time is zero seconds to return zero rows.
Adding the BETWEEN predicate was equivalent to waving a dead chicken over the keyboard. There was no apparent reason to expect it to help, but since the BETWEEN predicate was an assertion of a true condition there was no reason to expect it to hurt.
There was no CLUSTERED index that might indicate BETWEEN would help.
The fast plan shows that sequential table scans were used for both tables, unlike the slow plan that shows indexes were used for everything.
In fact, the fast plan shows that no indexes were even considered, whereas the slow plan shows that SQL Anywhere worked very hard to pick the right indexes.
So... there's some magic afoot; here are the plans:
In the absence of an actual explanation, I'm going to go with Dead Chicken: When nothing else seems to help, try adding a correct but redundant BETWEEN predicate! 🙂
------------------------------------------------------------------------- CREATE OR REPLACE VARIABLE @menu_down_count UNSIGNED BIGINT = 20; CREATE OR REPLACE VARIABLE @connection_id_string VARCHAR ( 50 ) = '1-33137-20160118152015-061'; CREATE OR REPLACE VARIABLE @locator_number UNSIGNED BIGINT = 227243; CREATE OR REPLACE VARIABLE @oldest_connection_history_sample_set_number UNSIGNED BIGINT = 227243; CALL sa_flush_cache(); ------------------------------------------------------------------------- -- 1. Fast query (0 sec) contains a redundant BETWEEN CLAUSE. SELECT TOP @menu_down_count rroad_group_2_property_pivot.sample_set_number AS locator_number, rroad_group_1_property_pivot.sample_recorded_at AS locator_timestamp, CAST ( 1 AS INTEGER ) AS artificial_row_type FROM rroad_group_2_property_pivot INNER JOIN rroad_group_1_property_pivot ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string AND rroad_group_2_property_pivot.sample_set_number < @locator_number AND rroad_group_2_property_pivot.sample_set_number BETWEEN @oldest_connection_history_sample_set_number -- Performance: Optimization: This redundant predicate seems to help the "no rows" result case. AND @locator_number ORDER BY rroad_group_1_property_pivot.sample_set_number DESC; See plan: plan1_fast_with_redundant_BETWEEN.saplan ------------------------------------------------------------------------- -- 2. Slow query (2.7 sec) contains a redundant BETWEEN CLAUSE. SELECT TOP @menu_down_count rroad_group_2_property_pivot.sample_set_number AS locator_number, rroad_group_1_property_pivot.sample_recorded_at AS locator_timestamp, CAST ( 1 AS INTEGER ) AS artificial_row_type FROM rroad_group_2_property_pivot INNER JOIN rroad_group_1_property_pivot ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string AND rroad_group_2_property_pivot.sample_set_number < @locator_number ORDER BY rroad_group_1_property_pivot.sample_set_number DESC; See plan: plan2_slow_without_redundant_BETWEEN.saplan ------------------------------------------------------------------------- CREATE TABLE DBA.rroad_group_1_property_pivot ( -- 256,931 rows sampling_id /* X */ UNSIGNED INT NOT NULL, sample_set_number /* PK X */ UNSIGNED BIGINT NOT NULL, sample_lost /* X */ VARCHAR ( 1 ) NOT NULL, ... other columns omitted sample_recorded_at TIMESTAMP NOT NULL, ... other columns omitted CONSTRAINT ASA277 PRIMARY KEY ( sample_set_number ) ); -- Parents of DBA.rroad_group_1_property_pivot -- none -- -- Children -- none -- CREATE INDEX ix_id_lost ON DBA.rroad_group_1_property_pivot ( sampling_id, sample_lost ); CREATE INDEX ix_number_DESC_id_lost ON DBA.rroad_group_1_property_pivot ( sample_set_number DESC, sampling_id, sample_lost ); ------------------------------------------------------------------------- CREATE TABLE DBA.rroad_group_2_property_pivot ( -- 1,967,439 rows sampling_id UNSIGNED INT NOT NULL, sample_set_number /* PK X */ UNSIGNED BIGINT NOT NULL, connection_number /* PK */ BIGINT NOT NULL, connection_id_string /* X */ VARCHAR ( 50 ) NOT NULL COMPUTE ( "STRING" ( "sampling_id", '-', "connection_number", '-', "DATEFORMAT" ( "LoginTime",'YYYYMMDDHHNNSS-SSS' ) ) ), ... other columns omitted LoginTime LONG VARCHAR NOT NULL DEFAULT '1900-01-01', ... other columns omitted CONSTRAINT ASA278 PRIMARY KEY ( sample_set_number, connection_number ) ); -- Parents of DBA.rroad_group_2_property_pivot -- none -- -- Children -- none -- CREATE INDEX xconnection_id_string ON DBA.rroad_group_2_property_pivot ( connection_id_string, sample_set_number ); -------------------------------------------------------------------------
Request clarification before answering.
In the fast plan with the sequential scans, the sequential scans actually aren't being run at all. The prefilter node is filtering with the predicate "FALSE", which means no rows will be requested from the nodes below. So the query will always return no rows.
The BETWEEN statement is evaluating to sample_set_number >= x AND sample_set_number <= x
which gets reduced to sample_set_number = x
But the other predicate is sample_set_number < x
.
Combining these two predicates results in FALSE because there is clearly no number that satisfies both of them.
To clarify: In the first example, we can determine there will be no rows in the result set without having to read a single page in the database. The actual contents of the table is irrelevant, it's similar to adding WHERE 1=2 to the query. In the second example, we have to actually search the index for rows with sample_set_number < n
.
When there's a prefilter node in a plan, the engine is evaluating a predicate that has no relation to any table or view. Usually this is a contradiction (WHERE 1=2).
The optimizer governor probably stops further optimization of the plan because the estimated cost to execute it is so small.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Ahhh... that would explain the "PreFilter" node in the plan diagram! ( doh! 🙂
In theory, code could be added to the application to test for combinations of host variables that would guarantee contradictions, and skip the query in those cases.
In practice, "if you can think of extra predicates, add extra predicates" might be a way to get the optimizer to do the heavy thinking 🙂
...especially when the query is taking forever to run.
The PreFilter node disappears when the @locator_number value is changed to 227343 which is higher by 100 and results in 20 rows being returned.
The plan looks completely different from either of the previous two plans, and it runs like a rocket (effectively zero time).
Bottom line: It is a wonderful thing to have the plan dynamically constructed each time a query is executed!
Here is the third plan: Third plan
CREATE OR REPLACE VARIABLE @menu_down_count UNSIGNED BIGINT = 20; CREATE OR REPLACE VARIABLE @connection_id_string VARCHAR ( 50 ) = '1-33137-20160118152015-061'; CREATE OR REPLACE VARIABLE @locator_number UNSIGNED BIGINT = 227343; CREATE OR REPLACE VARIABLE @oldest_connection_history_sample_set_number UNSIGNED BIGINT = 227243; CALL sa_flush_cache(); SELECT TOP @menu_down_count rroad_group_2_property_pivot.sample_set_number AS locator_number, rroad_group_1_property_pivot.sample_recorded_at AS locator_timestamp, CAST ( 1 AS INTEGER ) AS artificial_row_type FROM rroad_group_2_property_pivot INNER JOIN rroad_group_1_property_pivot ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string AND rroad_group_2_property_pivot.sample_set_number < @locator_number AND rroad_group_2_property_pivot.sample_set_number BETWEEN @oldest_connection_history_sample_set_number -- Performance: Optimization: This redundant predicate seems to help the "no rows" result case. AND @locator_number ORDER BY rroad_group_1_property_pivot.sample_set_number DESC;
If I understand correctly, there are no rows in the table with sample_set_number < 227243 (and this is known by the application?).
In plan2 we are filtering on: sample_set_number < 227243
In plan3 we are filtering on: 227243 < sample_set_number < 227343
plan2 is (erroneously?) expecting to find 88% of the rows below 227243 on group_1, so the plan is constructed in a different order.
Perhaps recreating the statistics will speed up the original query without the between statement?
Yes, plans 1 and 2 correctly find no rows, except that plan 2 takes forever.
Recreating statistics is a non-starter because it is a high-throughput application with millions of rows regularly inserted and deleted.
I suppose the occasional Plan From Hell might be regarded as an aspect of The Dark Side Of Dynamic Plan Calculation, but these exceptions occur SO INFREQUENTLY that I am happy to deal with them... and "redundant predicate" is a welcome addition to The Refrigerator Of Solutions next to Dead Chicken 🙂
I guess the issue is the predicate isn't truly redundant from SQL Anywhere's perspective. Even if the lowest number in the column is higher then 227242, the only way we would know this is from the column statistics. And we can't actually use that for the search because column statistics are non transactional (and in this case, very wrong). Even if that wasn't a problem, we can't construct an access plan assuming there are no rows below 227243 because that could change before we execute it.
You have an extra 'hint' of information that's being passed by the BETWEEN clause.
Ideally, the column statistics would be kept a little more accurate (I wonder how they got so skewed?). Then the optimizer would probably use plan3 for the no rows case as well.
OK, so maybe the predicate is not redundant from the optimizer's point of view. Since mere mortals have no way of predicting the optimizer's behavior, it reinforces this rule of thumb: "If you have a predicate, use a predicate."
In other words, just because the predicate is not required for the purposes of correct execution, is no reason not to include it. SQL Anywhere needs all the information it can get to pick a plan that is fast as well as correct.
The goal is not to create an elegant query, but to create a fast correct query.
User | Count |
---|---|
74 | |
30 | |
9 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.