cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with a LEFT JOIN and further conditions for the preserved table

VolkerBarth
Contributor
4,329

It's monday morning, and I'm somewhat puzzled with a LEFT JOIN with a combined join condition...

Say, I would use the SQL Anywhere 12 demo database to list all customer pairs with identical postal code, and would also list those customers who do not have a "paired" customer. (Yes, this is a very constructed sample but I'm trying to do something similar but more useful in my real database...):

So, as a first step I would query the pairs (here additionally restricting C1.ID to limit the result set):

select C1.PostalCode, C1.ID, C1.Surname, C2.ID, C2.Surname
from Customers C1 inner join Customers C2
   on C1.PostalCode = C2.PostalCode and C1.ID < C2.ID and C1.ID < 130
order by C1.ID, C2.ID

This lists

PostalCode,ID,Surname,ID,Surname
46032,105,McCarthy,121,Elkins
27695-7209,107,Colburn,118,Sanford
R3C 3V6,123,Lin,143,Piper

Now, as second step, I simply turn the INNER JOIN into a LEFT JOIN to get those customers who don't have a "pair":

select C1.PostalCode, C1.ID, C1.Surname, C2.ID, C2.Surname
from Customers C1 left outer join Customers C2
   on C1.PostalCode = C2.PostalCode and C1.ID < C2.ID and C1.ID < 130
order by C1.ID, C2.ID

This lists

PostalCode,ID,Surname,ID,Surname
07070,101,Devlin,,
10154,102,Reiser,,
19301,103,Niedringhaus,,
37919,104,Mason,,
46032,105,McCarthy,121,Elkins
64579,106,Phillips,,
27695-7209,107,Colburn,118,Sanford
37421,108,Goforth,,
...
R3C 3V6,123,Lin,143,Piper
...
53141,130,Monella,,
11716,131,Sinnot,,
...
11700,665,Thompson,,

So this does add customers who do not have a pair w.r.t. postal code, but it also adds those with ID >= 130, although the join condition should prevent that in my understanding.

I have tested both with v12.0.1.3324 and 8.0.3 with identical results.

So it seems in order to filter on C1.ID, I have to put the condition C1.ID < 130 in the WHERE clause:

select C1.PostalCode, C1.ID, C1.Surname, C2.ID, C2.Surname
from Customers C1 left outer join Customers C2
   on C1.PostalCode = C2.PostalCode and C1.ID < C2.ID
where C1.ID < 130  -- re-organized filter on C1    
order by C1.ID, C2.ID

However, in order to filter on C2.ID, I still have to put the according condition C2.ID < 130 in the ON clause in order to retain the OUTER JOIN semantics (placing that in the WHERE clause is a common error that turns the join into an INNER JOIN automatically...):

select C1.PostalCode, C1.ID, C1.Surname, C2.ID, C2.Surname
from Customers C1 left outer join Customers C2
   on C1.PostalCode = C2.PostalCode and C1.ID < C2.ID and C2.ID < 130
order by C1.ID, C2.ID

Question:

Is this asymmetrical behaviour between further conditions for the preserved and the null-supplying side by design?

Accepted Solutions (1)

Accepted Solutions (1)

VolkerBarth
Contributor

Well, after asking my own question the answer seems straight-forward (possibly because it's noon in the meantime...):

Yes, there's this asymmetrical behaviour: When doing a OUTER JOIN with C1 as preserved table, you're asking

  • for all rows from table C1 that have a matching row in C2
  • and for all other rows from table C1.

So, if you want to filter C1 generally, you will have to apply that filter in the WHERE clause.

You may or may have to apply the filter in the ON condition, too, in case it has influence on whether C2 has matching rows or not. In the samples above, that's unnecessary.

MCMartin
Participant
0 Kudos

Maybe a warning, that a senseless condition is in the statement would help prevent such misunderstandings?

VolkerBarth
Contributor
0 Kudos

I'm not sure whether such a condition would appear as senseless - it depends on what your looking for...

Maybe we would need that "WITH (READ MY MIND)" hint:)

Answers (2)

Answers (2)

Breck_Carter
Participant

[This was originally a comment to Breck's other answer.]

What do you mean exactly, "asymmetrical behavior"? IMO you should only put join conditions involving both tables in the ON clause, never selection conditions involving only one table... put those in a derived table if you want them evaluated before the join, in the outer WHERE if you want them evaluated after the join.

VolkerBarth
Contributor
0 Kudos

Breck, I guess you have given a very useful rule of thumbs with the recommendation to use derived tables or WHERE clauses for the preserved table. I'm gonna try to remember this...

The reason for my original wrong expectation has to do with the particular left join being a self-join and the impression that I wanted to join particular "types of rows" from instance A with their "partners" from instance B - or without them if they had no "partner". As it is a self-join, I felt (and still feel) that it seems somewhat asymmetrical to put the "type filter" for A in a WHERE clause (or derived table) and the type filter for B in the JOIN clause.

But I do understand that this is the way left joins work:)

Breck_Carter
Participant
select C1.PostalCode, C1.ID, C1.Surname, C2.ID, C2.Surname
from ( select * from Customers where Customers.ID < 130 ) C1 left outer join Customers C2
   on C1.PostalCode = C2.PostalCode and C1.ID < C2.ID
order by C1.ID, C2.ID
VolkerBarth
Contributor
0 Kudos

Another alternative, for sure - and it shows that same asymmetrical behaviour...