Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

How to aggregate data from a database table

Former Member
0 Likes
2,273

Hi all,

I need to aggregate data from a database table and I think about two alternatives:

1) Usage of SELECT SUM

2) Aggregate data in ABAP programming, i.e.: Read the data non-aggregated from the database into an internal table. Loop over the table and COLLECT the data record into an HASHED TABLE.

I know aggregatíng data in ABAP programming is generally considered not as best practice, but collecting into an hashed table should also be quite performant.

What do you think is the better alternative?

Best regards,

Markus

1 ACCEPTED SOLUTION
Read only

Former Member
0 Likes
1,131

> often hear an argument against doing things on the database side, but not sure if it's really valid.

yes, but this argument is simple in the case of the ORDER BY, save the ORDER BY and SORT on the application server even if the database is maybe even more clever when it sorts. (exception: the used index gives the ORDER for free).

However, if you save to tranfer records as in:

+ aggregates

+ DISTINCT

+ ORDER BY together with UP TO n ROWS

then you save also time on the db. Here, the rule of thumb says if you expect to save 50% of the records then do on the db.

I have no idea about complicted GROUP BY and HAVING rules, but I would assume, that they are hard to generalize.

Siegfried

6 REPLIES 6
Read only

HermannGahm
Product and Topic Expert
Product and Topic Expert
0 Likes
1,131

Hi Markus,

if we think about two extreme examples:

1st:

if the result set after aggregation is rather big compared to the result set that is read (result set before aggregation) you could try the two different options to check which one is faster

2nd:

if the result set after aggregation is rather small compared to the result set that is read (result set before aggregation) the dabase should be faster because you can save at the (network) transfer costs and memory consumption at the application server level.

If in doubt, just try and compare... and let us know your results with all the details

Kind regards,

Hermann

Read only

Former Member
0 Likes
1,131

@Herrman,

I do not understand your first option, aggregation should in the worst case leave the data untouched, i.e. nothing collected.

Maybe you have something different in mind than only the aggregate functions.

Original question, Please that the database read need per record 200micro to a few milliseconds, the internal table operations are in the range of a few microseconds (sorted quite similar to hashed, deviating ony at very large tables). So every record which is transferred to much is costly. Do the aggregation in the database is the standard recommendation.

Additionally be aware that DB operations and ABAP operations can give different results, especially in the case of the AVG becuase the db know a NULL value and ABAP not.

Siegfried

Read only

HermannGahm
Product and Topic Expert
Product and Topic Expert
0 Likes
1,131

Hi Siegfried,

i was thinking about the effort that comes with aggregates and groups (sort, group or make distinct) which is work that needs to be done even in the extreme case when we don't collect anything and leave the data untouched. On some database platforms i was rather surprised how big the effort for group by / distinct was. In such a scenario where the aggregated resultset is rather big (compared to the unaggregated one) i personally would give it a try to aggregate on the application server e.g. with collect think.

If the aggregation reduces the amount of data to be transferred a lot i would go for the database.

Thanks for adding the NULL topic which is very important too at least for AVG, MIN, MAX, COUNT... in case of SUM a NULL is ignored or treated as 0 (in standard SQL) which should be ok i think.

Kind regards,

Hermann

Read only

Former Member
0 Likes
1,131

Hello Hermann and Siegfried,

thanks a lot for your advice!

I checked both alternatives on the development system. Both show almost the same performance. But as the amount of data is much greater on productive system, I prefer altnerative 1) which should do the better job with a larger amount of data, because less records need to be transferred from the database server.

Best regards,

Markus

Read only

0 Likes
1,131

I often hear an argument against doing things on the database side, but not sure if it's really valid.

The argument is that you often have one database server but several application servers, so you don't want to put extra work on your only db server.

Read only

Former Member
0 Likes
1,132

> often hear an argument against doing things on the database side, but not sure if it's really valid.

yes, but this argument is simple in the case of the ORDER BY, save the ORDER BY and SORT on the application server even if the database is maybe even more clever when it sorts. (exception: the used index gives the ORDER for free).

However, if you save to tranfer records as in:

+ aggregates

+ DISTINCT

+ ORDER BY together with UP TO n ROWS

then you save also time on the db. Here, the rule of thumb says if you expect to save 50% of the records then do on the db.

I have no idea about complicted GROUP BY and HAVING rules, but I would assume, that they are hard to generalize.

Siegfried