Solved: Re: LOOP AT vs INNER JOIN

Former Member · ‎2012 Mar 23

Hello everyone.

I have been searching for hours in google about this question, but I have not found anything strictly related to this issue.

The programmers in my company uses "LOOP AT table_a" to match records in "Table_b", is better to use INNER JOIN to this proposes? why?

I think the INNER JOIN was made to match criteria at low level machine language, what represents an excellent performance, and LOOP AT was to made to print information, but I'm not much sure about this.

I appreciate your help, thank you.

abdul_hakim · ‎2012 Mar 26

Ivan,

It all depends on the context you are using both the approaches.

Making use of Join:

If you want to read logically related information from tables then it is advisable to use the JOIN over Nested SELECTs.

Making use of Nested LOOP:

If you want to optimize the performance of nested LOOP then you may go for Parallel Cursor.

As i mentioned above the best approach depends on the requirement and the context where it is being implemented in ABAP Code.

You can use the Runtime analysis and compare the performance.

Hope this helps.

Thanks

Abdul Hakim

Former Member · ‎2012 Mar 23

Hi,

the main problem I see with an INNER JOIN is that you need an extra structure to hold the sum of the fields of both tables. At times the data in table_b has been loaded from the database already so it wouldn't make sense to hit the database again. Then there is the cases where you look up table_b to check whether a record doesn't exist. Then there is the case where you have an function module that selects the data from the database but it's missing the description to the value of one field. In that case I'd load the descriptions separately and read them during a LOOP AT if needed.

On the other hand there are definitely use cases where and inner join, or better yet a view on the 2 database tables makes more sense. As always it is a matter context what makes more sense.

I usually try to use standard SAP function modules or methods to retrieve data from standard tables and then work with 2 internal tables, instead of creating a view or join on database tables that might change during patching and mess up my code.

Cheers
Adi

Former Member · ‎2012 Mar 24

Hi Ivan,

Try reading through this thread I started years go for some background information - http://scn.sap.com/thread/919187

I came to the conclusion that newer ABAP developers were favouring FOR ALL ENTRIES but none of them really knew why. It's good to see someone questioning this approach. I still maintain that an INNER JOIN is better in almost all scenarios, you just have to know how to write efficient SQL.

FOR ALL ENTRIES has the small benefit of making the code slightly easier to understand sometimes which may help with maintenance - I would argue that we should be up skilling our ABAP developers rather than dumbing down our code though.

Hope this helps,

Gareth.

Former Member · ‎2012 Mar 26

For performance critical sections, I tend to do both and use whichever is faster. It isn't always obvious in advance which that is going to be. It is entirely likely that my SQL skills are the reason why sometimes a join is unexpectedly slower. In some cases, though, the join really is slower.

In cases where performance isn't an issue, I'd argue for using whichever method produces the most understandable code.

UweFetzer_se38 · ‎2012 Mar 26

In most cases JOIN is faster, because we don't need the network traffic (for each select single), especially if you have different appl/db servers. Rule of thumb: "let do the DB all the work"

And btw: there is a third option:

SELECT * FROM SFLIGHT AS F INTO SFLIGHT_WA
    WHERE SEATSOCC < F~SEATSMAX
      AND EXISTS ( SELECT * FROM SPFLI
                     WHERE CARRID = F~CARRID
                       AND CONNID = F~CONNID
                       AND CITYFROM = 'FRANKFURT'
                       AND CITYTO = 'NEW YORK' )
      AND FLDATE BETWEEN '19990101' AND '19990331'.
ENDSELECT.

Best regards
(@se38)

SuhaSaha · ‎2012 Apr 02

Good one Uwe!

I'm sure the "self-proclaimed" performance-analysis champions in most of the companies will reject it due to the usage of Sub-query

Anyway I'll try to run a SQL trace & a runtime-analysis to compare the results of this sub-query against INNER JOIN.

BR,

Suhas

UweFetzer_se38 · ‎2012 Apr 02

Hi Suhas,

would be nice if you can publish your results here or in a separate blog. Never used sub-queries myself so far (not needed in the last 20 years of my ABAP career ).

The example was copied from the performance examples in SE80, so it was not my invention.

Best regards
(@se38)

Former Member · ‎2012 Apr 02

... maybe not as interesting as you might expect. The subquery can not replace the INNER JOIN, it suitable only in special cases as the one above, that is the reason why Uwe has never used it. Then the performance is slightly better but depends on the task.

Extensive performance measurments can be downloaded here:

www.dpunkt.de/abap_performance

see Beispiele V0 ... there is nearly everything what is discussed here again and again

Runs also in Englisch, description however only in German.

Siegfried

Former Member · ‎2012 Apr 02

o.k., load the DB exercise and execute the test case no '715' and '716' and you will get measurements (relaible but DB dependent)

Something like this, in some DB platforms the subquery is faster in others slower ... i..e not so interesting, THE FAE is slower!

{code}

Improve Nested Selects - Nested Selects 715 1.278.000

Improve Nested Selects - FOR ALL ENTRIES 716 415.000

Improve Nested Selects - Subquery 716 259.000

Improve Nested Selects - Join 716 269.000

Improve Nested Selects - View 716 273.000

{code}

Former Member · ‎2012 Mar 26

this is an endless discussion, I cones up at least once every second week in this forum and it was always answered.

+ Try a join if you need information from 2 or more tables.

+ Use FOR ALL ENTRIES if one internal table was created or selected at an earlier point in the programs execution and later additional data mus be read from another DB table. Use the nested loops to combine to 2 result tables and be ware that you must use an optimized READ or LOOP for the inner operation of the nested loop, i.e. sorted or hashed table or BINARY SEARCH.

* Do not use nested SELECTs or SELECTs inside LOOPs (exception: SELECT used single-record table buffer)

Using a FOR ALL NTRIES to replace a join is NOT recommended. It can make sense, if the join becomes very large, many tables with conditions which do not use primary keys. Then the DB optimizer can become confused. But then you will also have to invest some time for the optimization.

Be aware, that the JOIN can be flexible to some extent, if the WHERE-clause varies a lot, then the DB optimizer can choose differend access pathes, for example different order of table accesses.

Note, the above mentioned SUBSELECT is very special, it can not replace all joins. However, if applicable then it can be even faster than the join. Again everything is done inside the DB.

Background: The Join does more inside the DB and trys to avoid unnecessary data transfers and accesses.

Siegfried

abdul_hakim · ‎2012 Mar 26

Ivan,

It all depends on the context you are using both the approaches.

Making use of Join:

If you want to read logically related information from tables then it is advisable to use the JOIN over Nested SELECTs.

Making use of Nested LOOP:

If you want to optimize the performance of nested LOOP then you may go for Parallel Cursor.

As i mentioned above the best approach depends on the requirement and the context where it is being implemented in ABAP Code.

You can use the Runtime analysis and compare the performance.

Hope this helps.

Thanks

Abdul Hakim

Former Member · ‎2012 Mar 26

please do not recommend parallel cursor anymore.

It is possible to use parallel index with internal tables, but it is cumbersome and cann easily lead to functional bugs and is not necessary if you use an optimized inner access => sorted table, read my blogs on internal tables.

it is also possible to use parallel cursors on the database but this should also not done out of performance reasons but only if you need the OPEN CURSOr with HOLD (was also explained lots of times).

Siegfried

Former Member · ‎2012 Mar 26

Hello everyone, thanks for your answers.

I'm just trying to RETURN RELATED ROWS WHEN THERE IS A MATCH IN BOTH TABLES ONLY, in whatever SQL Language (Microsoft SQL Server, Oracle, MySQL, etc) it's advised the INNER JOIN for this proposes, but the company where I am working at, ABAP programmers uses to write a LOOP, what I think it's wrong, adding this takes 1 hour to return related rows, there are a lot of bad practices else in the company and I want to expose all of them, but I need strong arguments to explain it. Actually, I am a C# - Oracle developer, and I like the best practices even if I have to spent time to learn about them and to apply them. I have learned ABAP, I understand clearly its syntax, but I think, for this simple case, it's wrong use LOOP instead INNER JOIN.

Thank you for your help

abdul_hakim · ‎2012 Mar 26

Siegfried,

I have used Parallel cursors on many occasions and it has helped me in improving the performance when compared over nested LOOPs.

Thanks

Abdul Hakim

Former Member · ‎2012 Mar 26

@abdul,

yes, because you did use a standard table in the nested loop. The performance gain over a key access to a sorted table is marginal, it might be lost because you need to sort both tables for parallel cursor.

abdul_hakim · ‎2012 Mar 26

Siegfried, Yes i have used standard tables and the performane gain was marginal. So for no issues. Things are going well. I have read your blogs about performance tuning and they are really excellent. Keep up the good work.

Thanks,

Abdul Hakim

HermannGahm · ‎2012 Mar 26

Hi Ivan,

or should i say Don Quijote? (since i think you have a loooong fight)

Write small examples that prove your points (e.g. join vs. loop and selects). Ask

them to prove their points / practises (with examples). Slowly and surely you might

be able to convince them. Oh and never mind... some people simply can't be convinced

they simply don't want to change something or learn something new.

Kind regards,

Hermann

Former Member · ‎2012 Mar 26

Hello Hermann, it's very funny your analogue with Don Quijote hahahaha, perfect to me I say because you right, people don't want to change them mind.

I think it has not been clear the fact INNER JOIN is better than LOOP AT for return rows where there is a match in two tables.... a lot of people has talk about parallel cursors, performance tips, etc.... but, just for this strictly case, very few post are useful to answer my question, I don't want anything else than know its low level behavior, which is better for a simple SELECT.

Thank you Hermann.

HermannGahm · ‎2012 Mar 26

Hi,

i think the best thing is to write a small test porgram and prove your points. You can use DD02L (tables) and DD03L (fields) for that, these tables are available everywhere.

Kind regards,

Hermann

joao_sousa2 · ‎2012 Mar 26

Database engines are optimized for this kind of operation so anyone will tell you that it's more efficient to use the inner join. There are two problems with the inner join which are not really performance related (well sort of):

Inner join makes you create an specific structure and you have to map the fields you really want. ABAPers are lazy (full disclosure, I'm also guilty), so it's much more comfortable to write select * into table X, where X type standard table of Y. This is bad memory wise, but hey, deadlines are deadlines
You can have 50 application servers, but only one database server. The inner join may have superior performance, but if you have a application server dedicated to data crunching it doesn't really matter. The database is being shared, the application server is yours. This is very special situation, and makes little sense from a landscape point of view, but in the end the ABAPer doesn't control the landscape he has to work with what he's been given.

yuri_ziryukin · ‎2012 Mar 27

Joao Sousa wrote:
You can have 50 application servers, but only one database server. The inner join may have superior performance, but if you have a application server dedicated to data crunching it doesn't really matter. The database is being shared, the application server is yours. This is very special situation, and makes little sense from a landscape point of view, but in the end the ABAPer doesn't control the landscape he has to work with what he's been given.

Hello Joao,

the second point is not that much relevant actually. In the JOIN the database needs to do the following:

- read blocks for the first table+index from disk (time!) and buffer;

- loop over the records in these blocks and read relevant blocks for the second table+index from disk (time!) and buffer;

- combine the result set and retrieve it back.

In the case of for all entries/loop the first two parts need to be done in the same way, but you'll have more overhead from transferring data to the application server and back. More records will be moved over the network. Actually combining the result is not that critical to move it form DB to the app server.

Regards,
Yuri

Former Member · ‎2012 Mar 27

... Ivan maybe you should read the answers again, the answers which you defined useful, are wrong or misleading ... the others are the good ones.

Application server versus database server is completely misleading, the FOR ALL ENTRIES is slower on the database, the join is the fastest even on the database! The ABAP time is orders faster if done correctly.

joao_sousa2 · ‎2012 Mar 27

"In the case of for all entries/loop the first two parts need to be done in the same way".

I'm sorry, but it's not "in the same way". If you are doing a non-index search on the second table, retrieving the second table can be extremely time consuming. And I'm not saying it's more efficient, quite the opposite.

joao_sousa2 · ‎2012 Mar 27

I made it quite clear that the "solutions" I posted were NOT the most eficient, and just tried to explain the psychology behind ABAPer choice.

Sometimes its interesting to understand your ABAPers and their choices, beyond the technical performance issues.

Former Member · ‎2012 Mar 27

>retrieving the second table can be extremely time consuming

yes, but then it is done in both cases, in the join and also in the FOR ALL ENTRIES .... the problem reamains the same.

The advantage of the FOR ALl ENTRIES are the cases where the db optimizer decides for the join to use a slow access (first or second table). But this is a bug of the db-optimizer and rarer than you think.

HermannGahm · ‎2012 Mar 27

Hi Joao

Joao Sousa wrote:
I made it quite clear that the "solutions" I posted were NOT the most eficient, and just tried to explain the psychology behind ABAPer choice. 
Sometimes its interesting to understand your ABAPers and their choices, beyond the technical performance issues.

yes, i understand what you are saying. I used to be an ABAP developer on release 4.0B for several years and i see your points. Here in the forum we are fighting since years to get rid of recommendations like "FAE is better than a join" so that's why we don't like to see such things...

😉

Kind regards,

Hermann

HermannGahm · ‎2012 Mar 27

and bugs on the DB should be saved on the DB (hints, statistics, patches, parameters, ...) the last ressort should be a less efficient workaround in ABAP with FAE...

joao_sousa2 · ‎2012 Mar 27

I don't know what exactly you are replying to. Did I say that FOR ALL ENTRIES was good? No, I didn't.

Former Member · ‎2012 Mar 29

Hi Ivan

I agree that the practice of using nested loops into internal tables to select data from DB tables it is not the best that you can do. You can find some examples statements for performance by calling transaction SE38 under Menu Environment->Examples->Performance Examples. But don't forget its program has its own specification and sometimes it will better to use internal tables against JOIN. (Let's say if the SQL results of a JOIN it is a huge internal table probably you will have performance and ABAP/Errors problems. At this case it is better to work with data packages by using loop into internal table statements).

yuri_ziryukin · ‎2012 Mar 29

This should not be the reason to use internal tables!

And join can also be used with packages.

Former Member · ‎2012 Mar 29

Hi Yuri

It was just an example. a lot of times you need the total items of a header to calculate something if the items are hudge (Records for two months of a daily special ledger per account in a Bank instalation to calculate totals according to user selections. Trust me if you go by fetching you are in wrong way)

yuri_ziryukin · ‎2012 Mar 29

George,

you'll have the same problem with "for all entries" solution. In the case that you describe, I'd use aggregation possibilities of SQL. This will be faster.

Former Member · ‎2012 Mar 29

Hi Yuri

Probably you have right for the most of cases but always exceptions exist. When you use the SQL aggregation functions.(its an example only)

Does the package result haves the correct totals or you will need and another table to collect the correct totals? How many rows can have this new table? When a routine, for every new account, can be called? Think all of this in an application server file exporting program.

Best regards

Former Member · ‎2012 Mar 29

I read all replying and seems cannot find the end of the conclusion of this question.

Anyway, is there anyone could show/ provide the example code with huge data? I am curious to know based on number. .

Cheers.

yuri_ziryukin · ‎2012 Mar 29

Well, Dinivian, pretty simple. Just do all your best to avoid splitting DB joins into subsequent SELECT statements with For All Entries option. As simple as that. No proof is required. If the database is selecting the right access path, the JOIN will be faster.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

LOOP AT vs INNER JOIN