2007 Aug 06 10:07 AM
Hi,
Will replacement of the inner join in the following code by "for all entries" make an effect on performance of the overall program or its really negligible ( the program is of 2586 lines ) .
SELECT DISTINCT partner INTO TABLE t_bp_addr
FROM adrc AS b INNER JOIN but020 AS a ON
aaddrnumber = baddrnumber
WHERE b~city1 = adrc-city1
AND b~post_code1 = adrc-post_code1
AND b~po_box = adrc-po_box
AND b~street = adrc-street
AND b~house_num1 = adrc-house_num1
AND b~house_num2 = adrc-house_num2
AND b~region = adrc-region
AND b~addr_group = 'BP'.
Please do reply.
Thanks,
Binay.
2007 Aug 06 2:58 PM
There seems to be a lot of mis-understanding of the difference between JOINs and "FOR ALL ENTRIES".
In another post I included the following example:
The transfer of data from the database to the application server is done in blocks whose size depends on the environment / database / etc. When you do a select then the database server runs the SQL and gets the matching records, and then passes them in blocks to the application server.
This can be seen in SQL trace for any large select:
select *
from trdir into table t_data up to p_cnt rows.
3.255 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000
5.572 TRDIR FETCH 721 0
5.149 TRDIR FETCH 721 0
5.184 TRDIR FETCH 721 0
5.275 TRDIR FETCH 721 0
5.616 TRDIR FETCH 721 0
5.338 TRDIR FETCH 721 0
4.709 TRDIR FETCH 674 0
In the above select 721 records are returned from TRDIR in each block using select *, with 674 in the last fetch
changing it to select a few fields:
SELECT name sqlx edtx varcl dbapl
FROM trdir INTO TABLE t_data UP TO p_cnt ROWS.
3.366 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000
1.796 TRDIR FETCH 2.645 0
3.936 TRDIR FETCH 2.355 0
Here 2645 records fetched first time, 2355 second fetch.
Less trips between the database and the application server will improve program performance, especially in production environments where the two are often separate physical servers. This can be seen by totalling the duration of the above two examples (first column) - 40,098 versus 9,098.
When a database JOIN is used, this is done entirely on the database server. If the join is based on full key fields or a full index then the database will efficiently get the required result set to be sent back to the application server in blocks as per the above examples.
The FOR ALL ENTRIES code is not implemented directly by the database - rather the Application server breaks the SQL into a statement of the form:
Select (fields) from (table) where keyfield IN (value1, value2, value3, .... )
How many such comparisons are done in each select depends on a system parameter set depending on the database being used - IT CAN BE AS LOW AS 10 RECORDS per select. In other words, FOR ALL ENTRIES ends up effectively doing lots of small selects from the table - this can be seen in the following SQL Trace:
SELECT name sqlx edtx varcl dbapl
FROM trdir INTO TABLE t_data2 FOR ALL ENTRIES IN t_data
WHERE name = t_data-name.
3.333 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 500
1.753 TRDIR FETCH 2.645 0
3.834 TRDIR FETCH 2.355 0
737 TRDIR PREPARE 0 SELECT WHERE "NAME" IN ( ?
3.992 TRDIR OPEN 0 SELECT WHERE "NAME" IN ( '
209 TRDIR FETCH 254 0
3.776 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '
182 TRDIR FETCH 254 0
4.023 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '
183 TRDIR FETCH 254 0
..... (lots more lines omitted) ...
For this example which reads the same 5000 records as the previous examples - the results come back 254 records per block and the total time was 4,895,781 - significantly higher than the other two.
FOR ALL ENTRIES is good for getting records from cluster tables like BSEG where a JOIN cannot be used, but it is not a generic way to improve performance.
The other trap to watch for with FOR ALL ENTRIES is that "Duplicate lines are automatically removed from the resulting set" - in other words it is the same as SELECT DISTINCT. If you don't allow for this with your field list you might miss some desired results.
Andrew
2007 Aug 06 10:11 AM
2007 Aug 06 10:13 AM
Hi,
One thing is for sure if there is any Buffering on these tables used in the JOIN the buffer is ignored as you are using JOIN.
It can improve, since JOIN forces the data analysis on the Database server which is a costly resource it can be useful if you use FOR ALL ENTRIES where the data analysis happens in the application server.
Advise: Performance issues aretricky as they depend a lot on the Data in the database and also the amount of data you are handeling, so it will be great if you can actually check both kinds of logic in your program and do Performance analsys in SE30.
Regards,
Sesh
2007 Aug 06 10:28 AM
Generally a 'for all entries' can not improve a join statement, as it splits one database access into two.
It can only help if the database accesses are very confused and database tneds to use a wrong index. A hint is then usually the better choice.
In your example, there are only conditions for table b (BUT020) in the where condition. This is in principle simple, but not in your case. Which index do you want to use? In my system I can see no suitable index... That might be your problem.
The join is simple, if you know b~addrnumber then the join uses the primary key of ADRC.
The comment above is the correct, and a good example where for all entries would help. But both of your tables are not buffered and can not be buffered.
Either rethink your where condition or check whether another index on BUT020 can be added, if there is not already one in your system which is not standard.
The number of lines in your program is irrelevant, all the rest can be fast, but a problem with a join on large tables can need 99% of the time.
Siegfried
2007 Aug 06 1:46 PM
hI
WHEN YOU TAKE THE INNER JOIN , IF IT HAD 2 TABLES OK IF YOU USE MORE THAN 2 TABLES THEN THE PERFORMANCE POINT OF VIEW FOR ALL ENTRIES IS BETTER
IF USE FOR ALL ENTRIES FOR ANY NUMBER OF TABLES
THE PERFORANCE IS VERY GOOD
REWARD IF USEFULL
2007 Aug 06 2:58 PM
There seems to be a lot of mis-understanding of the difference between JOINs and "FOR ALL ENTRIES".
In another post I included the following example:
The transfer of data from the database to the application server is done in blocks whose size depends on the environment / database / etc. When you do a select then the database server runs the SQL and gets the matching records, and then passes them in blocks to the application server.
This can be seen in SQL trace for any large select:
select *
from trdir into table t_data up to p_cnt rows.
3.255 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000
5.572 TRDIR FETCH 721 0
5.149 TRDIR FETCH 721 0
5.184 TRDIR FETCH 721 0
5.275 TRDIR FETCH 721 0
5.616 TRDIR FETCH 721 0
5.338 TRDIR FETCH 721 0
4.709 TRDIR FETCH 674 0
In the above select 721 records are returned from TRDIR in each block using select *, with 674 in the last fetch
changing it to select a few fields:
SELECT name sqlx edtx varcl dbapl
FROM trdir INTO TABLE t_data UP TO p_cnt ROWS.
3.366 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 5000
1.796 TRDIR FETCH 2.645 0
3.936 TRDIR FETCH 2.355 0
Here 2645 records fetched first time, 2355 second fetch.
Less trips between the database and the application server will improve program performance, especially in production environments where the two are often separate physical servers. This can be seen by totalling the duration of the above two examples (first column) - 40,098 versus 9,098.
When a database JOIN is used, this is done entirely on the database server. If the join is based on full key fields or a full index then the database will efficiently get the required result set to be sent back to the application server in blocks as per the above examples.
The FOR ALL ENTRIES code is not implemented directly by the database - rather the Application server breaks the SQL into a statement of the form:
Select (fields) from (table) where keyfield IN (value1, value2, value3, .... )
How many such comparisons are done in each select depends on a system parameter set depending on the database being used - IT CAN BE AS LOW AS 10 RECORDS per select. In other words, FOR ALL ENTRIES ends up effectively doing lots of small selects from the table - this can be seen in the following SQL Trace:
SELECT name sqlx edtx varcl dbapl
FROM trdir INTO TABLE t_data2 FOR ALL ENTRIES IN t_data
WHERE name = t_data-name.
3.333 TRDIR REOPEN 0 SELECT WHERE ROWNUM <= 500
1.753 TRDIR FETCH 2.645 0
3.834 TRDIR FETCH 2.355 0
737 TRDIR PREPARE 0 SELECT WHERE "NAME" IN ( ?
3.992 TRDIR OPEN 0 SELECT WHERE "NAME" IN ( '
209 TRDIR FETCH 254 0
3.776 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '
182 TRDIR FETCH 254 0
4.023 TRDIR REOPEN 0 SELECT WHERE "NAME" IN ( '
183 TRDIR FETCH 254 0
..... (lots more lines omitted) ...
For this example which reads the same 5000 records as the previous examples - the results come back 254 records per block and the total time was 4,895,781 - significantly higher than the other two.
FOR ALL ENTRIES is good for getting records from cluster tables like BSEG where a JOIN cannot be used, but it is not a generic way to improve performance.
The other trap to watch for with FOR ALL ENTRIES is that "Duplicate lines are automatically removed from the resulting set" - in other words it is the same as SELECT DISTINCT. If you don't allow for this with your field list you might miss some desired results.
Andrew
2007 Aug 07 2:17 PM
Please read:
<a href="/people/rob.burbank/blog/2007/03/19/joins-vs-for-all-entries--which-performs-better">JOINS vs. FOR ALL ENTRIES - Which Performs Better?</a>
Rob
2007 Aug 08 11:19 PM
You use DISTINCT which turns off buffering in any case, so the system is not optimizing to begin with, even if you eliminate the join.
As another poster noted, FOR ALL ENTRIES also eliminates duplicates, but it may do so with buffering still on, I am not sure of that.
If that is the case, FOR ALL ENTRIES might be useful.
You have to check for an empty itab though with FOR ALL ENTRIES. If the itab is empty, the WHERE clause is ignored.
IF i_adrc[] IS INITIAL.
REFRESH t_bp_addr.
ELSE.
SELECT partner INTO TABLE t_bp_addr
FROM but020
FOR ALL ENTRIES IN i_adrc
WHERE addrnumber = i_adrc-addrnumber
AND city1 = i_adrc-city1
AND post_code1 = i_adrc-post_code1
AND po_box = i_adrc-po_box
AND street = i_adrc-street
AND house_num1 = i_adrc-house_num1
AND house_num2 = i_adrc-house_num2
AND region = i_adrc-region
AND addr_group = 'BP'.
ENDIF.
Note that the DISTINCT has been removed.
Good luck
Brian
Message was edited by:
Brian Sammond
2007 Aug 09 7:55 AM
sometimes I also comment on other comments:
Brian, your comment is not really helpful, because you completely omittsthe order
of execution. Hw should the select on ADRC be done, there is no condition for this table, you can not just assume that it is done, then whole table is selected.
The selection must start with with the BUT020!
The problem here is NOT related to the coding or the join or the usage of distinct.
It is only related to the selection on BUT020, in a standard system there is no
index which supports the above where condition.
So get the question on the right track.
The solution is a new index with the fields of the where condition, satring with mandt, and then ordered by decresing selectivity.
Siegfried