on 2010 Mar 05 10:53 PM
I have a customer table (8 million records or so) and I want to set each customer record with the value of the latest order date from the orders table.
I'm in a replication environment, so I don't want to update records needlessly by setting nulls to nulls and dates to the same date.
Here's the table layout (vastly simplified);
CREATE TABLE "rhiner"."customer" (
"Customer_id" integer NOT NULL DEFAULT global autoincrement,
"Last_order_date" date NULL,
PRIMARY KEY ( "Customer_id" )
);
CREATE TABLE "rhiner"."orders" (
"order_id" integer NOT NULL DEFAULT global autoincrement,
"order_date" date NULL,
"customer_id" integer NULL,
PRIMARY KEY ( "order_id" )
);
ALTER TABLE "rhiner"."orders"
ADD NOT NULL FOREIGN KEY "customer" ( "customer_id" )
REFERENCES "rhiner"."customer" ( "Customer_id" );
Here's what I want to to... Update only those customers that have an order on file that is later than the last recorded order date on the Customer table.
My attempt fails... well, it updates everything, but it updates EVERYTHING -- not just the records that need updating. (And it generates a bajillion replication messages needlessly.)
update rhiner.customer c
set last_order_date =
(select max(order_date)
from rhiner.orders
where orders.customer_id = customer.customer_id
)
I've tried various things to pull the max(order_date) out the subquery to use in a WHERE clause for the customer, but I've had no luck. But somehow, the results of the subquery need to be part of the where clause. Well, unless I scrap the subquery idea and do a JOIN.
Any help is appreciated!
Using SQLA 9.0.2.3850, replicating via dbremote
Try this:
update customer cust
set cust.last_order_date = dt.max_order_date
from ( select c.customer_id, max( o.order_date ) as max_order_date
from customer c join orders o on c.customer_id = o.customer_id
group by c.customer_id ) dt
where cust.customer_id = dt.customer_id
and ( cust.last_order_date is null
or cust.last_order_date < dt.max_order_date )
Here's what I did to get to this statement:
I started by computing the max order date for each customer:
select c.customer_id, max( o.order_date ) as max_order_date
from customer c
join orders o on c.customer_id = o.customer_id
group by c.customer_id
Then I used this query to compose the list of customer rows that actually needed to be updated by joining the derived table (dt) back to the customer table and restricted the rows to those that have a later date (or the customer date is null).
A similar strategy can typically be used for most update scenarios like this one.
Note that if you update a column in a table to its same value then the server will not actually update the column but it may (depending on your version and build) cause triggers to fire, including updating any DEFAULT CURRENT TIMESTAMP (and similar) columns. If you are seeing bajillion rows being updated, I will presume that you must have something like this in your full schema.
I'm not a query expert,... so the query gurus may have a better solution?
HTH
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hah - Mark beat me to it - I was going to post the identical answer (though Mark has a small typo in his answer).
The problem you have is that you've only specified the subquery in the UPDATE statement's SET clause, and not in the query's WHERE clause - consequently you'll update every single customer tuple (as you learned). You could write this using two subqueries:
However, as Mark has already posted, doing it over a join is easier and more efficient, since rather than nested-iteration semantics you'll compute the last-order date for every customer in one go.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
67 | |
10 | |
10 | |
10 | |
10 | |
8 | |
8 | |
7 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.