cancel
Showing results for 
Search instead for 
Did you mean: 

How can I use a subquery to limit my WHERE clause in an UPDATE?

Former Member
3,270

I have a customer table (8 million records or so) and I want to set each customer record with the value of the latest order date from the orders table.

I'm in a replication environment, so I don't want to update records needlessly by setting nulls to nulls and dates to the same date.

Here's the table layout (vastly simplified);

CREATE TABLE "rhiner"."customer" (
    "Customer_id" integer NOT NULL DEFAULT global autoincrement,
    "Last_order_date" date NULL,
    PRIMARY KEY ( "Customer_id" )
);

CREATE TABLE "rhiner"."orders" (
        "order_id" integer NOT NULL DEFAULT global autoincrement,
    "order_date" date NULL,
    "customer_id" integer NULL,
    PRIMARY KEY ( "order_id" )
 );

ALTER TABLE "rhiner"."orders" 
    ADD NOT NULL FOREIGN KEY "customer" ( "customer_id" ) 
    REFERENCES "rhiner"."customer" ( "Customer_id" );

Here's what I want to to... Update only those customers that have an order on file that is later than the last recorded order date on the Customer table.

My attempt fails... well, it updates everything, but it updates EVERYTHING -- not just the records that need updating. (And it generates a bajillion replication messages needlessly.)

update rhiner.customer c
set last_order_date = 
  (select max(order_date)  
  from rhiner.orders 
  where orders.customer_id = customer.customer_id
       )  

I've tried various things to pull the max(order_date) out the subquery to use in a WHERE clause for the customer, but I've had no luck. But somehow, the results of the subquery need to be part of the where clause. Well, unless I scrap the subquery idea and do a JOIN.

Any help is appreciated!

Using SQLA 9.0.2.3850, replicating via dbremote

VolkerBarth
Contributor

Well, that's one type of problem one might call a SQL Pattern - cf. http://sqlanywhere-forum.sap.com/questions/386/are-there-sql-design-patterns

Accepted Solutions (2)

Accepted Solutions (2)

MarkCulp
Participant

Try this:

update customer cust
   set cust.last_order_date = dt.max_order_date
  from ( select c.customer_id, max( o.order_date ) as max_order_date
           from customer c join orders o on c.customer_id = o.customer_id
          group by c.customer_id ) dt
 where cust.customer_id = dt.customer_id
   and ( cust.last_order_date is null 
         or cust.last_order_date < dt.max_order_date )

Here's what I did to get to this statement:

I started by computing the max order date for each customer:

select c.customer_id, max( o.order_date ) as max_order_date
  from customer c 
  join orders o on c.customer_id = o.customer_id
 group by c.customer_id

Then I used this query to compose the list of customer rows that actually needed to be updated by joining the derived table (dt) back to the customer table and restricted the rows to those that have a later date (or the customer date is null).

A similar strategy can typically be used for most update scenarios like this one.

Note that if you update a column in a table to its same value then the server will not actually update the column but it may (depending on your version and build) cause triggers to fire, including updating any DEFAULT CURRENT TIMESTAMP (and similar) columns. If you are seeing bajillion rows being updated, I will presume that you must have something like this in your full schema.

I'm not a query expert,... so the query gurus may have a better solution?

HTH

Former Member
0 Kudos

Awesome Mark! I'm not sure what typo Glenn was referring to... but I added some parens to the where clause. where cust.customer_id = dt.customer_id and ( ( cust.last_order_date is null ) or (cust.last_order_date < dt.max_order_date ) )

Former Member

Hah - Mark beat me to it - I was going to post the identical answer (though Mark has a small typo in his answer).

The problem you have is that you've only specified the subquery in the UPDATE statement's SET clause, and not in the query's WHERE clause - consequently you'll update every single customer tuple (as you learned). You could write this using two subqueries:

  • one in the SET clause, to give the last-order date to set the customer row to
  • a second, near-identical subquery to filter out those customers who do not need any changes.

However, as Mark has already posted, doing it over a join is easier and more efficient, since rather than nested-iteration semantics you'll compute the last-order date for every customer in one go.

Former Member
0 Kudos

thanks both Mark and Glenn! This is very cool. I never knew about this ability. The capabilities of this product continue to surprise me. And I've only been using it since it was called "Watcom SQL 2.0". I wonder if this could be used to simulate crosstabs.

Answers (0)