‎2006 Feb 16 4:55 AM
Hi,
I have 3 tables Header (Hdr), Item (Itm) and Schedule Line (Sdl) table. The number of entries is max in Sdl table (Eg: Hdr = 1000; Item = 20000 and Sdl = 200000 records). I need to join these 3 tables in my SELECT statement to fetch the required records.
Case A:
-
Header AS HDR
JOIN Item AS ITM
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
JOIN ScheduleLine AS SDL
ON ITMITMID = SDLITMID
Case B:
-
ScheduleLine AS SDL
JOIN Item AS ITM
ON ITMITMID = SDLITMID
JOIN Header AS HDR
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
Questions:
-
1) The performance of the Select query is really bad with Case A and very good (really fast) with Case B. Will the number of records in the tables used determine the sequence of join in the Select statement ?
2) If yes, is this independent of database used ? That is, will the behaviour be the same in all databases (Eg: Oracle, DB2, SQL Server, etc) ?
Appreciate your help.
Thanks,
Karthik
‎2006 Feb 16 5:01 AM
1) The performance of the Select query is really bad with Case A and very good (really fast) with Case B. Will the number of records in the tables used determine the sequence of join in the Select statement ?
Yes. Since the matching records are less in case of join of Hdr and Item.
2) If yes, is this independent of database used ? That is, will the behaviour be the same in all databases (Eg: Oracle, DB2, SQL Server, etc) ?
It may also depend on a variety of factors like the keys used, buffering etc on the database used.
‎2006 Feb 16 5:01 AM
1) The performance of the Select query is really bad with Case A and very good (really fast) with Case B. Will the number of records in the tables used determine the sequence of join in the Select statement ?
Yes. Since the matching records are less in case of join of Hdr and Item.
2) If yes, is this independent of database used ? That is, will the behaviour be the same in all databases (Eg: Oracle, DB2, SQL Server, etc) ?
It may also depend on a variety of factors like the keys used, buffering etc on the database used.
‎2006 Feb 16 5:06 AM
‎2006 Feb 16 5:15 AM
this isn't addressing your question and I don't know the structure of your tables so I could be wrong, but I would have expected the join to have been coded like this:
Header AS HDR
JOIN Item AS ITM
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
JOIN ScheduleLine AS SDL
ON HDRORDID = SDLORDID
AND HDRVRSIOID = SDLVRSIOID
AND ITMITMID = SDLITMID
......
surely the scheule join needs to specify the header key fields too (unless the item id is absolutely unique)??
‎2006 Feb 16 5:19 AM
Karthik,
Aren't we missing the where clause?
The sequence of join is usually dictated by the table you want to clamp records on.
As to your second point, if the same indexes have been created, then the same behavior can be expected of different databases. Over a period of time they level out.
‎2006 Feb 16 5:31 AM
One more point to add is
the order of field comparisons in the WHERE clause does add to the query efficiency.
‎2006 Feb 16 10:18 PM
I believe that the performance discrepancy between the two versions is caused by you not specifying the joins correctly (i.e. not qualifying the header and item at schedule level as I mentioned previously).
CAse A.
-
Header AS HDR
JOIN Item AS ITM
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
JOIN ScheduleLine AS SDL
ON ITMITMID = SDLITMID
reads all 1000 hDRs, joins all 20000 items correctly to these BUT then joins schedule lines only by item id so there will be a very large number of matches. i.e. any item with itemid 10 will match to EVERY SDL with itemid 10. as there are 20000 items and 200000 schedule lines this will be massive!
Case B:
-
ScheduleLine AS SDL
JOIN Item AS ITM
ON ITMITMID = SDLITMID
JOIN Header AS HDR
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
This will not have the same amount of DB traffic as the SDL is now the driving force. It will still match more than you intend but will only be reading the items redundantly trather than the more voluminous schedules.
................................
Basically, your sql is wrong and should qualify the matches right down to schedule line level as per my earlier post. When you have changed the sql, try the two alternatives again and see if there is any performance differences. I suspect that there will not be, but it will be interesting to find out for sure.
‎2006 Feb 17 4:06 AM
Hi Neil,
Actually I had not specified all the where conditions. Below is the complete query.
SELECT SDLORDID SDLITMID FROM
Header AS HDR JOIN Item AS ITM
ON HDRORDID = ITMORDID
AND HDRVRSIOID = ITMVRSIOID
JOIN ScheduleLine AS SDL
ON ITMITMID = SDLITMID
APPENDING CORRESPONDING FIELDS OF TABLE et_ordid
FOR ALL ENTRIES IN it_itmmatlocto
WHERE
itm~vrsioid = iv_vrsioid
AND itm~ortype = iv_ortype
AND itm~matid = it_itmmatlocto-matid
AND itm~locidto = it_itmmatlocto-locidto
AND HDR~PRTIDFR IN IT_HDRPRTFR
AND HDR~PRTIDTO IN IT_HDRPRTTO
AND SDL~ORDNO IN IT_ORDNO0
Here all the internal tables (it_itmmatlocto, IT_HDRPRTFR, IT_HDRPRTTO and IT_ORDNO0) has proper values which will reduce the selected records. Even though th selection is reduced WHERE condition, there is significant difference between Case A and Case B. As said earlier, Case A times out and Case B gets executed immediately.
‎2006 Feb 17 4:08 AM
Hi Karthik,
Is your question answered.
If yes, reward points and close the thread.
‎2006 Feb 17 4:12 AM
yes, but I think the performance will improve if you specify the JOIN of the schedule lines in more detail.
ON ITMITMID = SDLITMID
should be
ON SDLORDID = ITMORDID
AND SDLVRSIOID = ITMVRSIOID
and ITMITMID = SDLITMID
...
joining only on itmid means there will be thousands of inncorrect matches.
‎2006 Feb 17 4:23 AM
If that is the case why is the performance Case B really fast. That is changing the Join Order sequence improves the performance.
If what you say is the reason then in both cases the performance should be really slow.
‎2006 Feb 17 4:40 AM
it makes sense that case b is faster than case a.
In case A the db system has to do multiple passes of the most voluminous table (SDL) to find the matches. In case b it is still wrong but will have to do less db passes because SDL is only passed once and the smaller tables are matched to it.
Change your joins and watch the improvement.
Imagine we were not talking about sql but abap loops instead. If we had:
case A.
loop at hdr.
loop at itm.
loop at sdl.
endloop.
endloop.
endloop.
case B.
loop at sdl.
loop at itm.
loop at hdr.
endloop.
endloop.
endloop.
Case A will be a lot slower than Case B and for similar reasons to the sql. CAse B only has to loop through the biggest table once.