Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

inner join

Former Member
0 Kudos
454

Hi,

Will replacement of the inner join in the following code by "for all entries" make an effect on performance of the overall program or its really negligible ( the program is of 2586 lines ) .

SELECT DISTINCT partner INTO TABLE t_bp_addr

FROM adrc AS b INNER JOIN but020 AS a ON

aaddrnumber = baddrnumber

WHERE b~city1 = adrc-city1

AND b~post_code1 = adrc-post_code1

AND b~po_box = adrc-po_box

AND b~street = adrc-street

AND b~house_num1 = adrc-house_num1

AND b~house_num2 = adrc-house_num2

AND b~region = adrc-region

AND b~addr_group = 'BP'.

Please do reply.

Thanks,

Binay.

1 ACCEPTED SOLUTION

Former Member
0 Kudos
219

There seems to be a lot of mis-understanding of the difference between JOINs and "FOR ALL ENTRIES".

In another post I included the following example:

The transfer of data from the database to the application server is done in blocks whose size depends on the environment / database / etc. When you do a select then the database server runs the SQL and gets the matching records, and then passes them in blocks to the application server.

This can be seen in SQL trace for any large select:

select *

from trdir into table t_data up to p_cnt rows.

3.255 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000

5.572 TRDIR FETCH 721 0

5.149 TRDIR FETCH 721 0

5.184 TRDIR FETCH 721 0

5.275 TRDIR FETCH 721 0

5.616 TRDIR FETCH 721 0

5.338 TRDIR FETCH 721 0

4.709 TRDIR FETCH 674 0

In the above select 721 records are returned from TRDIR in each block using select *, with 674 in the last fetch

changing it to select a few fields:

SELECT name sqlx edtx varcl dbapl

FROM trdir INTO TABLE t_data UP TO p_cnt ROWS.

3.366 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000

1.796 TRDIR FETCH 2.645 0

3.936 TRDIR FETCH 2.355 0

Here 2645 records fetched first time, 2355 second fetch.

Less trips between the database and the application server will improve program performance, especially in production environments where the two are often separate physical servers. This can be seen by totalling the duration of the above two examples (first column) - 40,098 versus 9,098.

When a database JOIN is used, this is done entirely on the database server. If the join is based on full key fields or a full index then the database will efficiently get the required result set to be sent back to the application server in blocks as per the above examples.

The FOR ALL ENTRIES code is not implemented directly by the database - rather the Application server breaks the SQL into a statement of the form:

Select (fields) from (table) where keyfield IN (value1, value2, value3, .... )

How many such comparisons are done in each select depends on a system parameter set depending on the database being used - IT CAN BE AS LOW AS 10 RECORDS per select. In other words, FOR ALL ENTRIES ends up effectively doing lots of small selects from the table - this can be seen in the following SQL Trace:

SELECT name sqlx edtx varcl dbapl

FROM trdir INTO TABLE t_data2 FOR ALL ENTRIES IN t_data

WHERE name = t_data-name.

3.333 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 500

1.753 TRDIR FETCH 2.645 0

3.834 TRDIR FETCH 2.355 0

737 TRDIR PREPARE 0 SELECT WHERE "NAME" IN ( ?

3.992 TRDIR OPEN 0 SELECT WHERE "NAME" IN ( '

209 TRDIR FETCH 254 0

3.776 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '

182 TRDIR FETCH 254 0

4.023 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '

183 TRDIR FETCH 254 0

..... (lots more lines omitted) ...

For this example which reads the same 5000 records as the previous examples - the results come back 254 records per block and the total time was 4,895,781 - significantly higher than the other two.

FOR ALL ENTRIES is good for getting records from cluster tables like BSEG where a JOIN cannot be used, but it is not a generic way to improve performance.

The other trap to watch for with FOR ALL ENTRIES is that "Duplicate lines are automatically removed from the resulting set" - in other words it is the same as SELECT DISTINCT. If you don't allow for this with your field list you might miss some desired results.

Andrew

8 REPLIES 8

Former Member
0 Kudos
219

Hi Binay,

It will be negligible.

Regards,

Atish

0 Kudos
219

Hi,

One thing is for sure if there is any Buffering on these tables used in the JOIN the buffer is ignored as you are using JOIN.

It can improve, since JOIN forces the data analysis on the Database server which is a costly resource it can be useful if you use FOR ALL ENTRIES where the data analysis happens in the application server.

Advise: Performance issues aretricky as they depend a lot on the Data in the database and also the amount of data you are handeling, so it will be great if you can actually check both kinds of logic in your program and do Performance analsys in SE30.

Regards,

Sesh

former_member194613
Active Contributor
0 Kudos
219

Generally a 'for all entries' can not improve a join statement, as it splits one database access into two.

It can only help if the database accesses are very confused and database tneds to use a wrong index. A hint is then usually the better choice.

In your example, there are only conditions for table b (BUT020) in the where condition. This is in principle simple, but not in your case. Which index do you want to use? In my system I can see no suitable index... That might be your problem.

The join is simple, if you know b~addrnumber then the join uses the primary key of ADRC.

The comment above is the correct, and a good example where for all entries would help. But both of your tables are not buffered and can not be buffered.

Either rethink your where condition or check whether another index on BUT020 can be added, if there is not already one in your system which is not standard.

The number of lines in your program is irrelevant, all the rest can be fast, but a problem with a join on large tables can need 99% of the time.

Siegfried

Former Member
0 Kudos
219

hI

WHEN YOU TAKE THE INNER JOIN , IF IT HAD 2 TABLES OK IF YOU USE MORE THAN 2 TABLES THEN THE PERFORMANCE POINT OF VIEW FOR ALL ENTRIES IS BETTER

IF USE FOR ALL ENTRIES FOR ANY NUMBER OF TABLES

THE PERFORANCE IS VERY GOOD

REWARD IF USEFULL

Former Member
0 Kudos
220

There seems to be a lot of mis-understanding of the difference between JOINs and "FOR ALL ENTRIES".

In another post I included the following example:

The transfer of data from the database to the application server is done in blocks whose size depends on the environment / database / etc. When you do a select then the database server runs the SQL and gets the matching records, and then passes them in blocks to the application server.

This can be seen in SQL trace for any large select:

select *

from trdir into table t_data up to p_cnt rows.

3.255 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000

5.572 TRDIR FETCH 721 0

5.149 TRDIR FETCH 721 0

5.184 TRDIR FETCH 721 0

5.275 TRDIR FETCH 721 0

5.616 TRDIR FETCH 721 0

5.338 TRDIR FETCH 721 0

4.709 TRDIR FETCH 674 0

In the above select 721 records are returned from TRDIR in each block using select *, with 674 in the last fetch

changing it to select a few fields:

SELECT name sqlx edtx varcl dbapl

FROM trdir INTO TABLE t_data UP TO p_cnt ROWS.

3.366 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000

1.796 TRDIR FETCH 2.645 0

3.936 TRDIR FETCH 2.355 0

Here 2645 records fetched first time, 2355 second fetch.

Less trips between the database and the application server will improve program performance, especially in production environments where the two are often separate physical servers. This can be seen by totalling the duration of the above two examples (first column) - 40,098 versus 9,098.

When a database JOIN is used, this is done entirely on the database server. If the join is based on full key fields or a full index then the database will efficiently get the required result set to be sent back to the application server in blocks as per the above examples.

The FOR ALL ENTRIES code is not implemented directly by the database - rather the Application server breaks the SQL into a statement of the form:

Select (fields) from (table) where keyfield IN (value1, value2, value3, .... )

How many such comparisons are done in each select depends on a system parameter set depending on the database being used - IT CAN BE AS LOW AS 10 RECORDS per select. In other words, FOR ALL ENTRIES ends up effectively doing lots of small selects from the table - this can be seen in the following SQL Trace:

SELECT name sqlx edtx varcl dbapl

FROM trdir INTO TABLE t_data2 FOR ALL ENTRIES IN t_data

WHERE name = t_data-name.

3.333 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 500

1.753 TRDIR FETCH 2.645 0

3.834 TRDIR FETCH 2.355 0

737 TRDIR PREPARE 0 SELECT WHERE "NAME" IN ( ?

3.992 TRDIR OPEN 0 SELECT WHERE "NAME" IN ( '

209 TRDIR FETCH 254 0

3.776 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '

182 TRDIR FETCH 254 0

4.023 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '

183 TRDIR FETCH 254 0

..... (lots more lines omitted) ...

For this example which reads the same 5000 records as the previous examples - the results come back 254 records per block and the total time was 4,895,781 - significantly higher than the other two.

FOR ALL ENTRIES is good for getting records from cluster tables like BSEG where a JOIN cannot be used, but it is not a generic way to improve performance.

The other trap to watch for with FOR ALL ENTRIES is that "Duplicate lines are automatically removed from the resulting set" - in other words it is the same as SELECT DISTINCT. If you don't allow for this with your field list you might miss some desired results.

Andrew

Former Member
0 Kudos
219

Please read:

<a href="/people/rob.burbank/blog/2007/03/19/joins-vs-for-all-entries--which-performs-better">JOINS vs. FOR ALL ENTRIES - Which Performs Better?</a>

Rob

Former Member
0 Kudos
219

You use DISTINCT which turns off buffering in any case, so the system is not optimizing to begin with, even if you eliminate the join.

As another poster noted, FOR ALL ENTRIES also eliminates duplicates, but it may do so with buffering still on, I am not sure of that.

If that is the case, FOR ALL ENTRIES might be useful.

You have to check for an empty itab though with FOR ALL ENTRIES. If the itab is empty, the WHERE clause is ignored.

  IF i_adrc[] IS INITIAL.
    REFRESH t_bp_addr.
  ELSE.
    SELECT partner INTO TABLE t_bp_addr
      FROM but020
      FOR ALL ENTRIES IN i_adrc
      WHERE addrnumber = i_adrc-addrnumber
        AND city1 = i_adrc-city1
        AND post_code1 = i_adrc-post_code1
        AND po_box = i_adrc-po_box
        AND street = i_adrc-street
        AND house_num1 = i_adrc-house_num1
        AND house_num2 = i_adrc-house_num2
        AND region = i_adrc-region
        AND addr_group = 'BP'.
  ENDIF.

Note that the DISTINCT has been removed.

Good luck

Brian

Message was edited by:

Brian Sammond

former_member194613
Active Contributor
0 Kudos
219

sometimes I also comment on other comments:

Brian, your comment is not really helpful, because you completely omittsthe order

of execution. Hw should the select on ADRC be done, there is no condition for this table, you can not just assume that it is done, then whole table is selected.

The selection must start with with the BUT020!

The problem here is NOT related to the coding or the join or the usage of distinct.

It is only related to the selection on BUT020, in a standard system there is no

index which supports the above where condition.

So get the question on the right track.

The solution is a new index with the fields of the where condition, satring with mandt, and then ordered by decresing selectivity.

Siegfried