cancel
Showing results for 
Search instead for 
Did you mean: 

Invalid Use of Aggregate Function

16,117

The help gives some examples of what constitutes an inappropriate use of an aggregate function :

Probable cause

You attempted to use an aggregate function inappropriately. For example, a SELECT statement cannot contain an aggregate function within a predicate in the WHERE clause unless the aggregate function constitutes an outer reference. Other invalid uses include specifying an aggregate function in the SET clause of an UPDATE statement, or using an aggregate function in a CHECK constraint.

I can certainly think of situations where you would get undesirable results, but I'm hoping someone could provide a better explanation as to the "Why" these uses are inappropriate.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

The most straightforward example is to use an aggregate function in a WHERE clause, ie

SELECT FROM

WHERE COUNT() > 5 GROUP BY <table.column>

In SQL, a SELECT statement's semantics are (roughly) as follows:

  1. Compute the Cartesian product of all of the table expressions in the FROM clause
  2. Apply the conditions in the WHERE clause to filter those rows that fail to satisfy the search condition
  3. If a GROUP BY exists, group the resulting rows into groups determined by matching the grouping attributes, treating NULL as equivalent to NULL
  4. If a HAVING clause exists, filter those grouped rows so that only the desired grouped rows remain
  5. ORDER the result by the ORDER BY clause if it exists
  6. Project the result using the listed columns in the SELECT list, so that only those attributes are returned to the client.

I said "roughly" because grouping, semantically-wise, is actually a fairly complicated thing to describe completely and precisely, particularly due to the difference in results when the query has a GROUP BY clause, and when it does not. Chris Date, in his book An Introduction to Database Systems, has a number of really lovely examples where he takes those semantics to task, but SQL is what it is...

Anyway - from those semantics you can see that including COUNT() in a WHERE clause is semantically invalid - the value of COUNT(*) is computed during the grouping step, so trying to apply the predicate beforehand during the selection (filtering) process is incorrect. Instead, one should move the predicate to a HAVING clause, so that the condition is tested after the groups are formed:

SELECT FROM

GROUP BY <table.column> HAVING COUNT() > 5

When the language includes WINDOW functions (and SQL Anywhere does support WINDOW queries) then the semantics are even more difficult to define and explain, but they do make sense - if you don't try to think about it too hard - even though I've yet to see a formal definition (in the mathematical sense) of WINDOW.

Answers (1)

Answers (1)

Breck_Carter
Participant

UPDATE t SET SUM ( c ) = 0;

...how would you like that done? Set c to zero in every row? Set alternating rows to random positive/negative numbers so the sum remains zero?

CREATE TABLE t ( c INTEGER CHECK ( AVG ( c ) = 0 ) ...

...what is the aggregation over which the AVG is calculated? Is it recalculated and rechecked with each row insertion/update/deletion?


It is likely that each case where that error message appears is somewhat different. Some may fall into the category "Engineering hasn't figured out how to support this usage" but most are probably "nonsensical impossible to calculate".

Perhaps you have run into a situation where the message doesn't seem right; if so, please share it with us.

0 Kudos

I run into the situation infrequently and most of the time the message seems right and I understand the logic, but in some situations I think it would still work.

One example: I have a one to many relationship between two tables, such as a bill and its line items.

I might try to write a procedure that totals the line items for a field on the bill with a statement that says:

update Bill
left outer join BillItems on BillItems.BillID = Bill.BillID
set Bill.Total = sum(BillItems.Total)
where BillID = X

The way I read the above, it would update the Bill.Total with the total of the list from the join, doing it each time for each item in the list, but I would not care because it would be the same total and I would not need a second statement to pull the total into a variable and then update.

There is certainly the argument that two statements are better than N updates without knowing how many are in the list, but it seems like the above is writing what I want to happen.

Probably more of an exception that would behave correctly but I thought I would get some more information.

Former Member

The way that statement reads, it is understandable what the goal may be.

As far as why it doesn't work (is prohibited)...

First, consider what would happen if you executed...

SELECT SUM(BillItems.Total) FROM Bill LEFT OUTER JOIN BillItems ON 
    Bill.BillID = BillItems.BillID

...you'd get the total of all BillItems including those not on the current bill. Without a GROUP BY, the aggregate will hit all rows. There's no easily defined way to specify GROUP BY and integrate it into the UPDATE statement, so you need to get creative:

UPDATE Bill b
    LEFT OUTER JOIN
    (SELECT BillID, SUM(Total) FROM BillItems GROUP BY BillID) bi
        ON b.BillID = bi.BillID
    SET b.Total = bi.Total
    WHERE b.BillID = X
0 Kudos

I see. Thank you for that example. I will give that a try next time.

VolkerBarth
Contributor

An easier (though possibly less performant) way would be to use a correlated subquery, such as (not tested!):

UPDATE Bill b
SET Total =
   (SELECT SUM(Total) FROM BillItems bi WHERE bi.BillID = b.BillID)
WHERE BillID = X