on 2014 Feb 11 11:02 AM
this select doesn't return what i expect
select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date from order_header oh left outer join order_line_items oli on oh.order_seq_num = oli.order_seq_num and oh.order_date >= '2013-10-01' group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date
it returns orders dated before 2013-10-01
this select returns what i expect
select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date from order_header oh left outer join order_line_items oli on oh.order_seq_num = oli.order_seq_num where oh.order_date >= '2013-10-01' group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date
note the order date qualifier is in the WHERE clause
i thought i understood left outer joins, but now i'm confused
That's expected behaviour:
The first query will return all rows from table order_header (independent of the order_date) and join those of them with an according order_date >= 2013-10-01 to the according order_line_items. So order headers before that date will show up without their items.
I guess the misunderstanding has to do with the following:
If you want to filter on the null-supplying side (here on order_line_items, say to restrict them to the first n items per order header or the like), then you have to specify that condition in the join's ON clause - otherwise (i.e. in the WHERE clause) you usually will turn the left join into an inner join (a common mistake, cf. Breck's list of characteristic errors no. 1).
But here you seem to want to filter on the preserved side, and then you can simply use the WHERE clause as usual.
(A further method to do so would be to use a derived table for order_header with the according WHERE clause and left join that derived table to the order_line_items table.)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
FWIW: The "null-supplying side" problem is discussed here in this FAQ with a somewhat very similar title (and OP):
Note: This link is not a self-join self-link:)
In addition to Volker's answer...
I am as confused as you are about the "expected behavior" of outer joins, but there are a couple of tricks that can help.
First of all, AVOID any predicates in the ON clause that DO NOT have anything to do with the join. Here that means do not code "and oh.order_date >= '2013-10-01'" in the ON clause... sometimes these clauses do what you expect, other times (as Volker has explained) they do what other people expect.
Second, use derived tables instead of the WHERE clause when you want to limit the rows from individual tables. Logically speaking, derived table expressions are calculated before the join clause, so they are a WONDERFUL way to divide and conquer complex requirements.
CAVEAT EMPTOR: I have not tested the following code...
select oh.customer_id, oh.order_number, oh.num_of_line_items, isNull( max( oli.line_number ), 0 ), oh.order_date from ( SELECT * FROM order_header WHERE order_date >= '2013-10-01' ) oh left outer join order_line_items oli on oh.order_seq_num = oli.order_seq_num group by oh.customer_id, oh.order_number, oh.num_of_line_items , oh.order_date;
Note: The SELECT * doesn't make the query any less efficient, but it DOES make the derived table much easier to code: you don't have to list the columns you need, the optimizer can figure that out without your help 🙂
Some folks (like me) don't like using abbreviated alias names, so this suggestion is for you: You can use the base table name as an alias name for the derived table, something I often do when retrofitting derived tables into large existing queries...
select order_header.customer_id, order_header.order_number, order_header.num_of_line_items, isNull( max( order_line_items.line_number ), 0 ), order_header.order_date from ( SELECT * FROM order_header WHERE order_date >= '2013-10-01' ) AS order_header left outer join order_line_items on order_header.order_seq_num = order_line_items.order_seq_num group by order_header.customer_id, order_header.order_number, order_header.num_of_line_items, order_header.order_date;
...so the outer select still refers to "order_header" even though it is now a derived table name.
The query optimizer does a pretty good job of "flattening out" the queries where possible, when derived tables and views are involved (you can think of a derived table as a local view).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I would certainly suggest to use your first sample (well, I tend to use short alias names...) - that's exactly what I tried to hint at with my "(A further method to do so would be to use a derived table for..." suggestion.
I guess this method to "localize filters" via derived tables really helps to solve such problems.
to Breck and Volker,
Volker's explanation of when to use a qualifier in the where clause vs. in the join was very helpful.
Breck's statement of "...AVOID any predicates in the ON clause that DO NOT have anything to do with the join.." makes a lot of sense and should make the conversion of *= much easier. I tested Brecks' example of a derived table, and it does work as expected. I also tested the select with a where clause (one of my original selects) to see which was faster. The where clause came back must faster but it could be a result of caching after I tested the derived table select. I will test both of these after the server "cools" down on different days.
Thanks for your help.
Tom Mangano
User | Count |
---|---|
71 | |
11 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.