on 2016 Sep 29 3:56 PM
I'm going to be a little cautious about calling this a bug after my previous spatial embarrassment, but I think something rather odd is happening here. It's a bit complicated but I'll try to explain in stages.
I am trying to run queries that identify which of various person records fall inside certain UK administrative areas. The link is based on the persons' postcodes. UK postcodes identify groups of usually 20-50 properties and the Ordnance Survey [OS] provide a OSGB grid reference for the centroid of each postcode area. The OS also provide polygons with the outlines of the administrative areas - in this case counties.
I have the person records in a table called Person, there are some 23k records in the table. The table includes various information including postcode; about half the records have a postcode. I have another table, PostcodeLocation, with the centroid of each postcode as an ST_Point. The table contains records for all 1.4m postcode in the UK. Most of the postcodes in the Person table have a matching value in PostcodeLocation but some will be out-of-date or incorrect.
A third table, Boundary-line-historic-counties_region, contains the (multi)polygons that represent the county boundaries.
When I run a query like:
select p.personid, town from person p join postcodelocation l on p.postcode = l.postcode where l.IQXPos.ST_Within((select Geometry from "Boundary-line-historic-counties_region" where "name"='Angus')) =1
I get the expected result set:
personid,town 'TI04201631032011000A','Arbroath' 'TI0SJJAN231020060005', 'TI161223240520110F97','Arbroath' 'TI215211160720026362','DUNDEE' 'TI322616240820110001','Arbroath' 'TI531716310320110001','Arbroath' 'TI32536','DUNDEE' 'TI35398','BY FORFAR'
ie 8 Person records have postcodes that are in Angus. However, as soon as I add any additional restraints to the WHERE clause eg:
select p.personid, town from person p join postcodelocation l on p.postcode = l.postcode where l.IQXPos.ST_Within((select Geometry from "Boundary-line-historic-counties_region" where "name"='Angus')) =1 and p.town like 'D%';
rather than getting a sub-set of my previous results, I get a completely different result:
personid,town 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' 'TI1238192502201117E7','Deeside' .......
This continues for hundreds of thousands of rows. Things to note:
I've tried reproducing the problem with a much smaller Person table but I haven't yet been able to recreate the issue, so it seems that something about the Person table is involved in creating the problem. This is hard to isolate as it has many foreign keys and is referenced by many other tables.
Am I missing something very obvious here?
A full validation on the database showed no issues. An unload & reload did not make any difference. Version 16.0.0.2213
Update: I thought I would take a look at the execution plans. Interestingly the working query plan was returned in a few seconds but it took over 20 minutes for the problem query to return a plan to the plan viewer. The main difference is that the problem one uses Spatial Index Probes which the other doesn't. I'd upload the plans if I could work out how to 🙂 - I've seen them attached to other questions!
Further Update: Graphical Plans for both good and bad queries attached using the Volker-Dmitri bodge.
http://sqlanywhere-forum.sap.com/upfiles/WithoutExtraCondition(CorrectResultSet).saplan.jpg
http://sqlanywhere-forum.sap.com/upfiles/WithExtraCondition(WrongResultSet).saplan.jpg
To retrieve plans, right click on the hyper links above and choose "Save link as.." or equivalent depending on browser, then remove jpg extension from resulting downloaded file.
And another update: I've been able to work out what determines whether I get the right result set or the wrong one. It depends on whether or not the engine uses the index on person.postcode (if it does it gets the right answer).
I created a new table:
CREATE TABLE "pears"."Person3" ( "personid" CHAR(20) NOT NULL, "town" CHAR(30) NULL, "postcode" CHAR(30) NULL, PRIMARY KEY ( "personid" ASC ) ) IN "system";and populated it with a select statement from the original Person table. When I ran the query
select p.personid from person3 p inner join postcodelocation l on p.postcode = l.postcode where l.IQXPos.ST_Within((select Geometry from "Boundary-line-historic-counties_region" where "name"='Angus')) =1;I got the wrong result set. Adding an index on person.postcode and running the query again gave the correct results. To check, I dropped the index, re-ran the query and we went back to the wrong results.
Reported to SAP as incident ID:002075129400004746052016
Fixed by Engineering Case 805460
SAP SQL Anywhere Version 16.0.0 build 2428
SAP SQL Anywhere Version 17.0 build 2816
The problem only happens when a multi-column index that includes a geometry column is used in the plan.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This is a WAG...
That whole business of using a "correlated parenthesized ( subquery ) as a function parameter in a WHERE predicate" may be OK because it appears in a few examples in the Help, but...
...those examples use the "comma join" operator in the outer SELECT, not an INNER JOIN with ON clause.
In other words, it's a kind of undocumented LATERAL join using the WHERE clause.
In practice, parenthesized ( subqueries ) are very dangerous things.
In theory they are allowed as function parameters but not as procedure CALL parameters (go figure!).
Personally, I limit parenthesized ( subquery ) usage to a few well-proven uses; e.g., SELECT lists (rare) and IF EXISTS statements (frequent).
Perhaps you can move the ST_Within predicate down into a derived table based on postcodelocation... or into a completely separate query to load a temporary table.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It's interesting what you say about the dangers of parenthesised (subqueries), I've probably been rather cavalier in my approach to them. However in this case it doesn't seem to be the cause. I tried eliminating the look-up entirely:
CREATE VARIABLE @Blah ST_Geometry; select Geometry into @Blah from "Boundary-line-historic-counties_region" where "name"='Angus'; select p.personid, p.town from person p join postcodelocation l on p.postcode = l.postcode and l.IQXPos.ST_Within(@Blah) =1 and p.town like 'D%';but it doesn't change the outcome.
What I'm actually trying to achieve here is a mechanism that can be incorporated into a reporting tool where we won't have complete control over the query construction. I'd hoped we'd just be able to keep ANDing on additional conditions, so I'm trying to get to the bottom of what works and what doesn't.
UPDATE The behaviour is a bit different however with the polygon in a variable. While
select count(postcode) from postcodelocation l where l.IQXPos.ST_Within(@Blah) =1;works fine, BUT
select p.personid, p.town , p.postcode from person p join postcodelocation l on p.postcode = l.postcode and l.IQXPos.ST_Within(@Blah) =1ie without the town condition, returns rows that don't meet the join condition
so I'm trying to get to the bottom of what works and what doesn't.
The latter part being quite successful, methinks:(
In theory, I'd assume that your approach to simply add ANDed conditions should work - at least I would use that approach, too.
However, IMVHO, in practise the spatial function call seems to prevent the correct execution. If that were a STP, I'd suggest to try with a kind of CROSS APPLY / LATERAL call, i.e. to buiild a filter on the persons/towns and to "apply" the result to the spatial call...
Hm, another very wild and even vague guess...
Late to the thread (again) but a couple of thoughts . . .
I am a bit confused by all the focus on the sub-query part of this. Does the bad plan show that as being anything but a separate entry under the "Subquery:" drop down in the plan viewer? If not, I would not suspect the sub-Q to be a significant part of this.
That subquery is just a single-row query and should not (otherwise) significantly contribute to the final plans. Unless I am wrong, that should be able to be replaced with a[n] ST_GEOMETRY (or other spatial-type) variable set to that query and the behaviour differences should remain, pretty much, the same.
Of course having a way to investigate the details behind this might shed a brighter light on all of this issue. And since we are leaning towards a bug, it would be good to get someone working on that part.
"Of course": - the optimizer could do a number of different things here - "rewritten spatial operations" is entirely conceivable but ... I would have to work from the plans (and possibly a representative db) to be abl to bring much else to this conversation.
YAK: Yet Another Kibitzer
P.S. FWIW it might also be interesting to see if you get similar problems with ST_CoveredBy and ST_WithinFilter
Hi Nick, thanks for looking at this. The sub-query looks fine - just a sequential scan of a very small table and is the same in both cases. As I don't seem to be able upload plans to the forum, is it OK to email them to you? I could get you the database too if that's wanted - its not too big.
I think y'all are missing the point of my WAG.
The subquery is not the problem, its use inside the l.IQXPos.ST_Within() function call in the WHERE clause is the problem... it is a very very very very very sophisticated construction since it refers to a table in the FROM clause.
This is definitely NOT a case of a simple singleton-select subquery.
I don't understand how this stuff works, BUT it is clear that the examples in the help DO NOT USE INNER JOIN or ON clauses at all when the ST_things() appear in the WHERE clause.
The examples use old-fashioned "FROM tablename, tablename" clauses.
Try changing this
select p.personid, town from person p join postcodelocation l on p.postcode = l.postcode where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.town like 'D%';
to this
select p.personid, town from person p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.town like 'D%' and p.postcode = l.postcode;
to make it look more like the examples in the Help that use the comma join instead of inner join:
SELECT c.PostalCode, c.City, z.CenterPoint FROM Massdata z, GROUPO.SpatialContacts c WHERE c.PostalCode = z.ZIP AND z.CenterPoint.ST_WithinDistance( ( SELECT CenterPoint FROM Massdata WHERE ZIP = '01775' ), 100, 'Statute mile' ) = 1 ORDER BY c.PostalCode;
I've tried
select p.personid, town from person p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.town like 'D%' and p.postcode = l.postcode;and still get the wrong result set - exactly the same wrong results as in the question above. When I comment out the additional criteron eg
select p.personid, town from person p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 -- and p.town like 'D%' and p.postcode = l.postcode;then we go back to the correct results.
If we checkout the first PersonId returned in the "wrong" result set 'TI1238192502201117E7', we get this
select postcode from person where personid = 'TI1238192502201117E7'; -- returns 'CH5 4XB' select IQXPos from postcodelocation where postcode = 'CH5 4XB'; -- returns 'Point (328810 369953)' select NEW ST_Point(328810,369953,0,999999).ST_Within((select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ); -- returns 0so the value is there even though it doesn't satisfy all the conditions in the WHERE clause.
Whats really odd is the if I use a different "extra" criteria, say and p.name like 'D%' we get a different "wrong" result set:
personid,town 'NZ0P0HKL16062011000D','Auckland' 'NZ0P0HKL16062011000D','Auckland' 'NZ0P0HKL16062011000D','Auckland' 'NZ0P0HKL16062011000D','Auckland'
checking out this value 'NZ0P0HKL16062011000D' we get:
select postcode from person where personid = 'NZ0P0HKL16062011000D'; -- returns '0604' select IQXPos from postcodelocation where postcode = '0604'; -- returns no record (unsurprisingly)which breaks a different condition.
OK, let's wave a dead chicken over the query:
select p.personid, town from ( SELECT * FROM person p WHERE p.town like 'D%' ) AS p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.postcode = l.postcode;
OK - chicken has been dispatched and the waving commences:
select p.personid, town from ( SELECT * FROM person p WHERE p.town like 'D%' ) AS p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.postcode = l.postcode;gives the wrong answer still, while removing the restriction on Person:
select p.personid, town from ( SELECT * FROM person p) AS p, postcodelocation l where l.IQXPos.ST_Within ( (select Geometry from "Boundary-line-historic-counties_region" where "name" = 'Angus' ) ) = 1 and p.postcode = l.postcode;makes it work just fine. Again it doesn't seem to matter what condition is specified, it goes strange.
I suppose I could try human sacrifice next - I have a little list ....
I'd upload the plans if I could work out how to 🙂 - I've seen them attached to other questions!
Coming back to your question: Is there a chance to post the plans as described in the workaround (i.e. as fake image files) in your according other question?
(Yes, it's a nasty workaround but I'd still prefer that to yur suggested next step...)
User | Count |
---|---|
66 | |
10 | |
10 | |
10 | |
10 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.