cancel
Showing results for 
Search instead for 
Did you mean: 

left outer join not working as expected

t1950
Participant
14,154

this select doesn't return what i expect

select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date  
from order_header oh left outer join order_line_items oli on oh.order_seq_num = oli.order_seq_num 
and oh.order_date >= '2013-10-01'   
group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date

it returns orders dated before 2013-10-01

this select returns what i expect

select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date  
from order_header oh left outer join order_line_items oli on oh.order_seq_num = oli.order_seq_num   
where oh.order_date >= '2013-10-01'  
group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date

note the order date qualifier is in the WHERE clause

i thought i understood left outer joins, but now i'm confused

Accepted Solutions (1)

Accepted Solutions (1)

VolkerBarth
Contributor

That's expected behaviour:

The first query will return all rows from table order_header (independent of the order_date) and join those of them with an according order_date >= 2013-10-01 to the according order_line_items. So order headers before that date will show up without their items.

I guess the misunderstanding has to do with the following:

If you want to filter on the null-supplying side (here on order_line_items, say to restrict them to the first n items per order header or the like), then you have to specify that condition in the join's ON clause - otherwise (i.e. in the WHERE clause) you usually will turn the left join into an inner join (a common mistake, cf. Breck's list of characteristic errors no. 1).

But here you seem to want to filter on the preserved side, and then you can simply use the WHERE clause as usual.

(A further method to do so would be to use a derived table for order_header with the according WHERE clause and left join that derived table to the order_line_items table.)

VolkerBarth
Contributor
0 Kudos

FWIW: The "null-supplying side" problem is discussed here in this FAQ with a somewhat very similar title (and OP):

Note: This link is not a self-join self-link:)

Answers (1)

Answers (1)

Breck_Carter
Participant

In addition to Volker's answer...

I am as confused as you are about the "expected behavior" of outer joins, but there are a couple of tricks that can help.

First of all, AVOID any predicates in the ON clause that DO NOT have anything to do with the join. Here that means do not code "and oh.order_date >= '2013-10-01'" in the ON clause... sometimes these clauses do what you expect, other times (as Volker has explained) they do what other people expect.

Second, use derived tables instead of the WHERE clause when you want to limit the rows from individual tables. Logically speaking, derived table expressions are calculated before the join clause, so they are a WONDERFUL way to divide and conquer complex requirements.

CAVEAT EMPTOR: I have not tested the following code...

select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date  
  from ( SELECT * FROM order_header WHERE order_date >= '2013-10-01' ) oh 
       left outer join order_line_items oli 
       on oh.order_seq_num = oli.order_seq_num 
 group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date;

Note: The SELECT * doesn't make the query any less efficient, but it DOES make the derived table much easier to code: you don't have to list the columns you need, the optimizer can figure that out without your help 🙂

Some folks (like me) don't like using abbreviated alias names, so this suggestion is for you: You can use the base table name as an alias name for the derived table, something I often do when retrofitting derived tables into large existing queries...

select order_header.customer_id, 
       order_header.order_number, 
       order_header.num_of_line_items, 
       isNull( max( order_line_items.line_number ), 0 ), 
       order_header.order_date  
  from ( SELECT * 
           FROM order_header 
          WHERE order_date >= '2013-10-01' 
       ) AS order_header 
       left outer join order_line_items  
       on order_header.order_seq_num = order_line_items.order_seq_num 
 group by order_header.customer_id,  
       order_header.order_number,  
       order_header.num_of_line_items,  
       order_header.order_date;

...so the outer select still refers to "order_header" even though it is now a derived table name.

The query optimizer does a pretty good job of "flattening out" the queries where possible, when derived tables and views are involved (you can think of a derived table as a local view).

VolkerBarth
Contributor
0 Kudos

I would certainly suggest to use your first sample (well, I tend to use short alias names...) - that's exactly what I tried to hint at with my "(A further method to do so would be to use a derived table for..." suggestion.

I guess this method to "localize filters" via derived tables really helps to solve such problems.

t1950
Participant
0 Kudos

to Breck and Volker,

Volker's explanation of when to use a qualifier in the where clause vs. in the join was very helpful.

Breck's statement of "...AVOID any predicates in the ON clause that DO NOT have anything to do with the join.." makes a lot of sense and should make the conversion of *= much easier. I tested Brecks' example of a derived table, and it does work as expected. I also tested the select with a where clause (one of my original selects) to see which was faster. The where clause came back must faster but it could be a result of caching after I tested the derived table select. I will test both of these after the server "cools" down on different days.

Thanks for your help.

Tom Mangano

t1950
Participant
0 Kudos

BTW,

Both answers shed new light on outer joins. Is it possible to accept both answers and close the question and how do I accept answers to close the question?

Tom M

Breck_Carter
Participant
0 Kudos

CALL sa_flush_cache() will cool down the server... but the only guaranteed way to flush everything is to restart the executable.

Breck_Carter
Participant
0 Kudos

No, you have to pick... and Volker is the one who actually answered the question 🙂