Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Inner Joins vs For All Entries - performance query

Former Member

Hi All,

I'm a bit confused here... I see lots and lots (and lots...) of postings from people asking how to get data from multiple tables.

To me the immediate answer is to use joins in my select statement to reduce the database load but more and more I see people suggesting FOR ALL ENTRIES is better from a performance perspective.

Now, simple question time, which is more efficient in the real world when doing something like the following:- (this is a basic example but I'm sure you know what I mean.)

Select  *
  into  table lt_sales_data
  from  vbap as vbap
 inner  join vbak as vbak
     on vbak~vbeln eq vbap~vbeln
  where  vbak~vbeln in so_vbeln.

or

Select  *
  into  table lt_vbak_data
  from  vbak
 where  vbeln in so_vbeln.

if lt_vbak_data[] is not initial.
select  *
  into  table lt_vbap_data
  from  vbap
  for all entries in lt_vbak_data
 where  vbeln eq lt_vbak_data-vbeln.
endif.

Basically I want to know whether joins or for all entries is better from a database performance perspective.

I want to know why as well so please don't just post links, random cut and paste answers or one liners. I'm convinced for all entries is slower but am willing to be persuaded otherwise if someone can show me proof.

Thanks,

Gareth.

1 ACCEPTED SOLUTION

Thomas8
Active Contributor
0 Kudos

In addition to what has been been posted already (and I strongly recommend studying any statements by performance meister Siegfried Boes in the quoted threads):

In your JOIN example you end up with all the data in one internal table, whereas in the FAE example your data is split in two internal tables. If your goal is to have all relevant data in one table (e.g. for ALV grid display), then JOIN is the better option most of the time, since it saves you a lot of additional internal table handling. Provided the JOIN uses full primary keys in the ON conditions, of course.

I have always done well using clean JOIN statements whenever I can, only using FAE when JOIN is not possible or recommended (e.g. cluster table or buffered tables involved) or the number of tables to be joined is just too large (about > 5 or 6).

Greetings

Thomas

19 REPLIES 19

GauthamV
Active Contributor
0 Kudos

hi,

goto se30 and click on tips and tricks.

just paste ur code and click on measure runtime.this will allow u to check the performance of both ur statements.

Former Member
0 Kudos

Hi Gareth,

please check these threads

These are really Good and Interesting Threads

/message/5571128#5571128 [original link is broken]

Best regards,

raam

former_member181995
Active Contributor
0 Kudos

Gareth,

The most important thing of that blog, in my modest opinion,I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant profile parameters.

It's explained in the following OSS notes:

Note 48230 - Parameters for the SELECT ... FOR ALL ENTRIES statement

Note 652634 - FOR ALL ENTRIES performance with Microsoft SQL Server

Note 634263 - Selects with FOR ALL ENTRIES as of kernel 6.10

It's worth to check these parameters, before making any decision, which way to choose.

Impressive! I must not rely in SAP information, because it doesn't come all toguether (ok, I knew that). I must check my syntax and for some special cases, ask my basis guys (eek!) about the DB tuning... Then I moved to the next one

2- Performance - what will kill you and what will leave you with only a flesh wound, by Rob Burbank

Not bad. Some interesting facts, but again, one comment made me raise an eyebrow. The comment lead me to the next blog:

3- [https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/4241] [original link is broken] [original link is broken] [original link is broken]; Anyone Got Some Real Benchmark Stats on "For all Entries"???,

by David Halitsky

Just the header made me smile. I believed from a long time we (coders) rely a lot in urban legends and/or "guru" comments. That guy doesn't? I love it!

But the main fact found there was the performance of fine tuned loop'ed SELECTs. It's just the thing I was thinking to do with my program... I though about it a bit, and I agree with that man.

The last blog, comes from the same person, and it's the second part of the number 3

Amit.

Former Member
0 Kudos

hiii

JOIN is better for performance wise.because we count performance on the basis of how many times data fetched from server.if we will use less time then performance will be better .in JOIN we only use one SELECT statement so in one go we will get data from server.but in FOR ALL ENTRIES we use server 2 times for getting that data.so server will be busy more.so JOIN will be better .but as if many tables are there JOIN condition become complex.so in that case FOR ALL ENTRIES is easy to use & understand.correct me if i m wrong.

thx

twinkal

Former Member
0 Kudos

Well the FOR ALL ENTRIES statement is known to be of bad performance since it forces a full table scan for the Table behind IN.

BUT you have to check it for every new requirement.

in your case here, you may be better of with the FOR ALL ENTRIES statement.

Why? Well still the staement forces a full table scan for lt_vbak_data.

But since lt_vbak_data is an internal table you can scan through it quite fast. about 1000 times faster than scanning through a nonbuffered DB table.

While in your join all operations are DB-operations. DB operations are the thing that tend to make the programs slow.

Thomas8
Active Contributor
0 Kudos

In addition to what has been been posted already (and I strongly recommend studying any statements by performance meister Siegfried Boes in the quoted threads):

In your JOIN example you end up with all the data in one internal table, whereas in the FAE example your data is split in two internal tables. If your goal is to have all relevant data in one table (e.g. for ALV grid display), then JOIN is the better option most of the time, since it saves you a lot of additional internal table handling. Provided the JOIN uses full primary keys in the ON conditions, of course.

I have always done well using clean JOIN statements whenever I can, only using FAE when JOIN is not possible or recommended (e.g. cluster table or buffered tables involved) or the number of tables to be joined is just too large (about > 5 or 6).

Greetings

Thomas

Former Member
0 Kudos

Thanks to all the opinions so far... You've backed up what I suspected although I maybe wasn't clear enough with my question and desired result. I was hoping someone could produce some hard and fast guidelines from SAP themselves dictating which is the better method. I know 10 years ago I was taught to use joins and to keep my database access to minimal levels, using keyed reads where applicable and using internal tables to filter data where I couldn't use key fields.

Gautham, I am aware of SM30 but that doesn't answer the general question I was asking. I've obviously used SM30 to run some comparisons but I was really hoping someone could give me a definitive answer based on hard facts or from attending some training at SAP recently. And no points for asking for them

Tamás, I agree with what you are saying about runing many comparisons - it appears to be dependant on any number of criteria which means each case may require different code. I've not managed to find a consistent comparison of the two methods that would lead me to use one method or the other...

Karan, thanks for your feelings but it doesn't really help me! Why/how does it retrieve the data faster than a join? Have you got testing/proof to back this up?

Raam, thanks for those links - they are interesting reads and seem to go through the same arguments I'm currently considering. Although I still don't have a definitie answer!

Amit, I understand exactly what yuo are saying about myths and urban legends That was my motivation for this post. To bo honest, it seems to me that a lot of the "newer" ABAP coders always go for FOR ALL ENTRIES but I wanted to know - is there a reason or do they all just cut and paste off SDN?! Are they all just scared of complex inner joins?! What would you all make of this Select statement for example? -

select  afvc~arbid
            afko~aufnr
            aufk~objnr
            afko~plnnr
            afko~plnal
            afko~aufpl
            afko~zaehl
            afpo~matnr
            makt~maktx
            afvc~vornr
            afvc~ltxa1
            afvu~aplzl
            afvu~usr10
            afvv~meinh
            afvv~bmsch
            afvv~vge02
            afvv~vgw02
            afvv~mgvrg
            afab~aplzl_vor
      into  table lt_recipe_orders
      from  afko as afko
     inner  join aufk as aufk
        on  aufk~aufnr eq afko~aufnr
     inner  join afpo as afpo
        on  afpo~aufnr eq afko~aufnr
     inner  join makt as makt
        on  makt~matnr eq afpo~matnr
     inner  join afvc as afvc
        on  afvc~aufpl eq afko~aufpl
     inner  join afvu as afvu
        on  afvu~aufpl eq afvc~aufpl
       and  afvu~aplzl eq afvc~aplzl
     inner  join afvv as afvv
        on  afvv~aufpl eq afvu~aufpl
       and  afvv~aplzl eq afvu~aplzl
      left  outer join afab as afab
        on  afab~aufpl_nch eq afvu~aufpl
       and  afab~aplzl_nch eq afvu~aplzl
       for  all entries in t_resources
     where  afko~gltrs ge v_start_date
       and  afko~gstrs le v_start_date
       and  afko~plnty eq gc_task_list_type_2
       and  afpo~dwerk eq v_werks
       and  makt~spras eq sy-langu
       and  afvc~arbid eq t_resources-objid.

Twinkal, I've always thought less DB hits is a better performing program too - the above example compares 2 db hits to 1... I don't have issues with complex joins because I've used them so long so can discount that problem but do agree that less DB hits is the ultimate goal. Providing of course the Selects you write are actually efficient in themselves.

Murthy, if you build your select statement correctly duplicate records can be avoided in most cases. How can you say a join statement will hit the database more when in my example there is 1 DB hit compared to 2 for a for all entries? And I'd love to know the reasoning behind never using a join on more than 2 tables?! Is that just an urban myth?!

Thomas, I've just been looking at some of Siegfried's posts and like what I am reading. As you say, using full keys via joins is essential.

Gareth.

0 Kudos

Hi!

Personally I like to "optimize" my programs the following way:

- there is a main SELECT with JOIN technique. This contains 2-3 tables, those table, which can provide the most restriction to the row numbers. So after the first main SELECT I'll have an internal table with relatively few entries. The typical is a header-items table join (f.e. EKKO-EKPO)

- I use this internal table with FOR ALL ENTRIES, to collect additional data (texts for vendor, customer, material number, statuses, etc)

These type programs work fine always.

I face sometimes with 1 huge problem: READ TABLE and LOOP statements could make the performance really low if the table is not SORTed correctly. SORT always your internal tables to those criteria, which you will use in the READ TABLE or LOOP statements. After 10000 entries this could really improve the performance.

Regards

Tamá

0 Kudos

Hi Tamás,

I follow a very similar approach but favour joins more and usually avoid for all entries.

Gareth.

Former Member
0 Kudos

Gareth - you've probably seen my blogs[JOINS vs. FOR ALL ENTRIES - Which Performs Better?|/people/rob.burbank/blog/2007/03/19/joins-vs-for-all-entries--which-performs-better] and [Performance - what will kill you and what will leave you with only a flesh wound|/people/rob.burbank/blog/2006/11/16/performance--what-will-kill-you-and-what-will-leave-you-with-only-a-flesh-wound], so I won't beat them to death.

The first one shows that in general, JOINS work better, but not in all cases. The program that I developed for that blog took longer using a JOIN when the amount of data was large. So I don't think it's possible to say that with certainty that one is better than the other even in a specific case.

The other one shows that there are far worse problems to worry about than this. If you SELECT from a large table without using a key effectively, it just doesn't matter which technique you use.

So, when deciding which to use, I don't use performance as a criteria. I think about ease of initial programming and maintenance instead.

In case you're wondering, I generally use FOR ALL ENTRIES.

Rob

0 Kudos

Hi.

I read this discussion at the weekend but didn't had my personal DBSL expert at hand... Now I'm back to the office and just talked with some IBM guy about this matter.

I think the most important statement is in Rob's blog: You have to compare. There is no simple answer like 'always use JOIN' or 'always use FOR ALL ENTRIES'.

FOR ALL ENTRIES is transfered to a long SQL statement with a lot of ORs in the WHERE-clause. If this statement gets too long it has to be split up into several statements. If this happens performance will go down. From the database perspective JOINS can be optimzed more easy than long OR/AND-statements. But here you have to notice that every database has it's own optimzer with it's own strengths and weaknesses. And the optimizer's results are depending on the actual statement. Buffering of tables is something else that should be taken into account: if a FOR ALL ENTRIES is just using tables in the buffer you can run it without accessing the DB at all. I'm not sure if that is true for JOINS, too.

In the performance discussion you should have a look on how you fill the table used by FOR ALL ENTRIES. If this table is used only for that purpose and is quite large you get a lot of DB traffic that a JOIN would not create...

To sum it up:

If in doubt or really performance critical-> measure.

I normally use JOIN if I don't have good reasons to switch to FOR ALL ENTRIES. But I'm not really a performance expert, so don't see this as an 'official' statement.

Best regards,

Jan Stallkamp

0 Kudos

Rob and Jan,

Thanks for the further input.

The varied answers to my question have backed up what I suspected and why I was motivated to post in the first place - people suggesting to use for all entries as it is faster than inner join aren't necessarily correct (and I strongly suspect only do so as they often don't fully understand inner joins); each approach has positives and negatives dependant on the situation; as with everything, experimentation and testing is crucial.

I'm glad to say I haven't learnt anything new or shocking and I don't have to change now I approach my developments and most importantly, I've not missed a massive announcement from SAP saying "inner joins bad, for all entries good"!

Thanks everyone

Gareth.

0 Kudos

Hi.

No, you haven't missed a big announcement. And I think if one of the alternatives would be bad in all cases we would mark it as deprecated or OpenSQL would transfer it into the other form under the hood.

And I think in this case we clearly see a problem here in the SDN community. Many people learn some rules and then use them all the time without understanding the background of this rules. A good set of rules might make your day easier but in the end as a good programmer you have to think yourself. You can't write efficient source code if you don't understand the problems. The second problem I have seen several times here is that people just belive something is a fact and then post it without real knowledge about the subject. And then there is no discussion of the results. I've seen questions where answers were given that were completly contrary. We have to be carefull that SDN is not a place where urban ledgends about programming are presented as facts.

I hope to see more threads like this one here!

/Jan

0 Kudos

Jan,

Thanks for this follow up - you've pretty much put into words what I've been thinking all along and why I was motivated to start this thread, I just hadn't said it so clearly!

Gareth.

0 Kudos

>

> Hi.

>

> I read this discussion at the weekend but didn't had my personal DBSL expert at hand... Now I'm back to the office and just talked with some IBM guy about this matter.

Hi Jan,

that's a good idea of you - sometimes I wonder if the main reason for failing in performance is the missing or (ignored) DB guru...

>

> I think the most important statement is in Rob's blog: You have to compare. There is no simple answer like 'always use JOIN' or 'always use FOR ALL ENTRIES'.

what's a little missing there is data volumes in the tables. You might have an idea how to process your tables but it may you are not always aware of the data volume to be retrieved - thats where optimizers can come in really handy. They can develop an efficeint plan if you give them the needed information. FAE tends to hide that (i.e. you split your statement in several statements).

But we agree: you have to compare and test both approaches.

>

> FOR ALL ENTRIES is transfered to a long SQL statement with a lot of ORs in the WHERE-clause. If this statement gets too long it has to be split up into several statements. If this happens performance will go

down.

sadly, that's true and often not recognized by FAE users.

I gave an example how many ADDITIONAL statements you produce if you take the naive approach:

"I will use FAE - the DB will take care of the rest (well, it's a black box , what do I have to care about?)".

From the database perspective JOINS can be optimzed more easy than long OR/AND-statements. But here you have to notice that every database has it's own optimzer with it's own strengths and weaknesses. And the optimizer's results are depending on the actual statement.

That's right - but MOSTLY they get it correct. Optimizer techniques are around since many,many years.

There will be always guys who try to be smarter than the optimizer and confronted with lot's of joins and different data volumes in the tables often turns out they are not that smart if they program around those bad,bad DB technique!

Sometimes (quite rarely) you have to correct the optimizer - but this would imply you have a good understanding for that optimizer for this database and the SQL used.

Buffering of tables is something else that should be taken into account: if a FOR ALL ENTRIES is just using tables in the buffer you can run it without accessing the DB at all. I'm not sure if that is true for JOINS, too.

Buffering is also done inside the database. Not every read goes to the disks...

> To sum it up:

>

> If in doubt or really performance critical-> measure.

>

> I normally use JOIN if I don't have good reasons to switch to FOR ALL ENTRIES. But I'm not really a performance expert, so don't see this as an 'official' statement.

keep on with your approach!

bye

yk

Former Member
0 Kudos

I have just written this code and used SE30 to measure it: -

Left hand screen put: -


types: begin of ty_data,
         vbeln type vbeln,
       end of ty_data.

data t_vbak type standard table of ty_data.

select vbak~vbeln
       into table t_vbak
       from vbak as vbak
       inner join vbap as vbap
       on vbap~vbeln eq vbak~vbeln.

right hand screen put


types: begin of ty_data,
         vbeln type vbeln,
       end of ty_data.

data t_vbak type standard table of ty_data.
data t_vbap type standard table of ty_data.

select vbeln
       into table t_vbak
       from vbak as vbak.

check not t_vbak[] is initial.

select vbeln 
       into table t_vbap
       from vbap
       for all entries in t_vbak
       where vbeln eq t_vbak-vbeln.

The measurements I got clearly state inner join is quick in this example.

Inner join was Runtime: 43,129 microseconds

For all entries was 141,932 microseconds

Also if you use for all entries you then have the data in more than one internal table, which also need extra processing. Bet you no-one thought of that bit.

0 Kudos

>

> Also if you use for all entries you then have the data in more than one internal table, which also need extra processing. Bet you no-one thought of that bit.

'Fraid they did - see Thomas Zloch's reply above

0 Kudos

Martin - a couple of points.

I'm surprised that there was such a large difference in the two SELECTs. did you do as SAP recommends: take multiple runs and use the minimum time for each for comparison?

And if you check the blog I mentioned, you'll see that it mentions that you get two tables

Rob

former_member194613
Active Contributor
0 Kudos

> I face sometimes with 1 huge problem: READ TABLE and LOOP statements could make the

> performance really low if the table is not SORTed correctly. SORT always your internal tables to those > criteria, which you will use in the READ TABLE or LOOP statements. After 10000 entries this could

> really improve the performance.

always try to use SORTED tables!

And with FAE from basis 6.10 onwards it should be possible to insert directly into the driver table.

Did anybody try that already?

Siegfried