Regarding SELECT query

former_member367551 · ‎2010 Apr 12

Dear experts,

Is there a way for the following SELECT query to be improved?

The problem that I'm seeing here is that the same table (l_item_tab) is being queried twice in the SELECT query (due to the "table joins", a and b). How can I improve the data retrieval here? And I also can't change the SELECT query to be out of the loop - ledger is a select-option and multiple values are possible.

* Get Ledger
  SELECT * FROM t881 INTO TABLE lt_t881
    WHERE rldnr IN s_rldnr.

  LOOP AT lt_t881.
*   Get FI-SL user-defined item table based on ledger
    PERFORM get_sl_item_tab USING lt_t881-rldnr CHANGING l_item_tab. 
  " L_ITEM_TAB is populated here

*   Get SL line items
    SELECT * APPENDING CORRESPONDING FIELDS OF TABLE gt_glu1
    FROM (l_item_tab) AS a
    WHERE
        rldnr         IN s_rldnr             " Ledger
    AND rbukrs        IN s_bukrs             " Company code
    AND ryyrkeg_wwsub IN s_wwsub             " Subsystem
    AND racct         IN s_racct             " Account no
    AND ryymac        IN s_yymac             " Management area
    AND rtcur         IN s_rtcur             " Trx currency
    AND docnr         IN s_docnr             " Doc. number
    AND docty         IN s_docty             " Doc. type
    AND docct         EQ c_docct_l           " Doc. category (L = Local)
    AND ryear         IN s_ryear             " Fiscal year
    AND budat         IN s_budat             " Posting date
    AND yystodt       IN s_stodt             " Reversal date
    AND yystgrd       IN s_stgrd             " Reversal reason
    AND yyintref      IN s_intref            " Interface ID

    AND NOT exists

      ( SELECT * FROM (l_item_tab) AS b
         WHERE
             b~rldnr    = a~rldnr     AND
             b~docnr    = a~docnr     AND
             b~rbukrs   = a~rbukrs    AND
       ( ( ( b~docct    = c_docct_y
          OR b~docct    = c_docct_x ) AND
             b~refryear = a~ryear )   OR
         ( ( b~docct    = c_docct_u
          OR b~docct    = c_docct_t ) AND
             b~ryear    = a~ryear ) )
      )     .

  ENDLOOP.

Edited by: Rob Burbank on Jun 23, 2010 12:33 PM

Former Member · ‎2010 Apr 12

what is the actual problem, the logic by itself is o.k., is it slow or what is the problem.

It is generic so it is hard to say, how is works with different table. It is definitely not a simple task.

former_member367551 · ‎2010 Apr 12

Hi Siegfried,

When I do a runtime analysis and an SQL trace on the program, the SELECT query appears to be the one that takes the longest time for execution. Perhaps the performance can be further improved here?

Former Member · ‎2010 Apr 12

Hi,

A few suggestions:

Can you user 'FOR ALL ENTRIES' instead of executing the select query within a loop?

Instead of using nested queries, it may be faster if you execute the inner nested query first - this will give you a few set of values which you then use as a filter in the main query. If this is not possible, try using range values in the inner nested query - ranges are much faster than normal logical expressions.

Hope this helps.

Wenonah

former_member367551 · ‎2010 Apr 12

Thanks for your ideas, Wenonah.. Will certainly try this out too..

But, as for the loop - it is valid to execute the SELECT statement within the loop, as the table (for the SELECT query) should get defined every time.

Former Member · ‎2010 Apr 13

> Will certainly try this out too..

maybe you should think twice .... The usage of a subselect is that the result set is not transferred to the application server it is only needed

during the selection.

You should anser the following questions:

+ who wrote the code? you or? I get the impression that you don't know what is intended.

+ SQL Trace, what are the numbers for repeated executions, (go to summary by SQL statement), duration, execution, records

+ how many different tables are accessed, l_item_tab is dynamic

+ what knid of tables are accessed?

Siegfried

former_member367551 · ‎2010 Apr 21

>


> > Will certainly try this out too..
> maybe you should think twice .... The usage of a subselect is that the result set is not transferred to the application server it is only needed 
> during the selection.
> 
> You should anser the following questions:
> + who wrote the code? you or? I get the impression that you don't know what is intended.
> + SQL Trace, what are the numbers for repeated executions, (go to summary by SQL statement), duration, execution, records
> + how many different tables are accessed, l_item_tab is dynamic
> + what knid of tables are accessed?
> 
> Siegfried

Hi Siegfried,

- The codes are currently existing ones and they were not written by me too. I just do know that the first / main SELECT statement in the query is for retrieving FI special ledger line item data records, while the second / sub SELECT statement is to ensure that the line item data records are not already reversed, and not a reversal.

- In the summarized SQL trace of a sample test run: executions = 1, identical executions = 0, duration = 700247324 (almost 100% of the total processing durations), records = 0 (there should be more records returned in an actual production run)

- Only one table, ZZGLV4A (custom) is accessed based on the selection screen variant. ZZGLV4A is an FI special ledger line item table and its data volume: 455 mil. data records. Note: The codes are written for a report that runs in the background, and the selection screen variant is used for the executions too. The table accessed (l_item_tab is ZZGLV4A in this case) depends on the ledger inputted here - only one ledger is specified for the current selection screen variant.

- Table accessed - FI special ledger line item data.

I tried tuning the query a little further by just properly specifying the WHERE fields - only a minimal improvement is observed, an average of about 6% of runtime improvement only (tested via SE30 in the development box for ZZGLV4A but its table volume is 4 mil. data records here only). This obviously works only for the table ZZGLV4A for now, I'm afraid.

Any other ideas on how such subqueries can be improved generally (maybe secondary indexes)? The subquery is certainly re-hitting the same table at least twice.

Thanks for the inputs once again!

Former Member · ‎2010 Apr 21

Hi Deborah,

Let me start with some general comments...

Often correlated subqueries (i.e. you're using fields from your outer table in your subquery) are not the best way of retrieving data, because it's hard (sometimes impossible) for the optimizer to rewrite such queries. I.e. in that case the database has to query for each row in the result set of the outer table the table in the subquery (so lots of executions if the conditions on the outer table are not selective). So one possible attempt is to rewrite the query so that you're either using an uncorrelated subquery or some other approach. Of course uncorrelated subqueries have their own problems, basically the issue is that the inner query must usually be executed first so that the outer query can step in.

Also for self-joins we have the obvious (but hard to find) approach of trying to avoid the multiple table reads and instead trying to combine them into one pass for reading all the data (often using a pattern with aggregation along with CASE statements, but in most cases I've not used such approach since Open SQL was too restrictive and I didn't want to switch to Native SQL).

That much for useless comments, now let's try to see what options we might have. It's late on my end though, so please bear with me if I'm not making any sense...

One way to get rid of a correlated subquery with a not exists is to rephrase it as an outer join. You can then basically identify the entries that you want by the fact that the in your left outer join all the table fields from the right table must be null (because you are looking for entries where the subquery was unsuccessful). Unfortunately this construct is not possible in Open SQL (try it and you'll see a syntax error). So you could switch to Native SQL, but then you'd loose the ability to process the rows en bloc (i.e. read the result set into an array). As an Open SQL alternative you could try to see if switching from not exists to an uncorrelated subquery with not in provides any improvement (depending on your actual data this might be much worse though).

Maybe there's also some answer from a functional perspective. Depending on your data and which fields you select you could possibly find other criteria for ensuring that you don't look at reversals or reversed documents (e.g. non-initial reversal dates in yystodt and maybe other conditions).

Cheers, harald

former_member367551 · ‎2010 Apr 28

Often correlated subqueries (i.e. you're using fields from your outer table in your subquery) are not the best way of retrieving data, because it's hard (sometimes impossible) for the optimizer to rewrite such queries. I.e. in that case the database has to query for each row in the result set of the outer table the table in the subquery (so lots of executions if the conditions on the outer table are not selective).

Hi Harald,

Thanks so much for the inputs.. Appreciate the elaborations given too..

I think I'll still be maintaining the codes as correlated subqueries for now.. But I'll also give a check on the data fields if there are other criteria that can also be used for achieving the same logic..

Former Member · ‎2010 Apr 21

I am sorry, but I think that it is beyond the possibilites of this forum, ait is not a simple problem and would require system access.

You know that SAP offers services, where such problems can be fixed.

> In the summarized SQL trace of a sample test run: executions = 1, identical executions = 0, duration = 700247324 (almost 100% of the total

> processing durations), records = 0 (there should be more records returned in an actual production run)

unfortunately only partly interesting, as nothing is found

> Only one table, ZZGLV4A (custom) is accessed based on the selection screen variant.

why dynamic coding, and change can be specified to that table, so dynamic usage would no longer be possible

> ZZGLV4A is an FI special ledger line item table and its data volume: 455 mil. data records.

.... huge !!! ... even the most optimized coding will take longer in such large tables, is archiving up-to-date ??

Siegfried

former_member367551 · ‎2010 Apr 28

Hello Siegfried,

You're right - I think I'll have to dig this up even further with the system..

Dynamic coding - this is needed, because the table to be accessed depends on the ledger specified in the selection screen..

I think performance becomes an issue here, largely due to the table volume.. Data archiving is currently not implemented in the system - at least not for now yet.. I'll try to have this justified..

I'm gonna try improve the performance by adding and using the relevant secondary indexes.. This should help with the data retrieval processes, as the current queries (both the main and sub queries) are not hitting any index fields at all..

I have tried changing the queries to list the WHERE fields orderly (according to the order of fields listed in the table in SE11) - so an improvement of about 8% was observed for a table volume of 4 mil. records (done in the development box - the same table has 455 mil. records in the production box)..

Thanks anyways!

Former Member · ‎2010 Apr 22

you can use 'FOR ALL ENTRIES'' to avoid loop.

eg. Select *

from <table>

into <internal table>

for all entries in <previously populated internal table>

where <conditions as it is >

Former Member · ‎2010 Apr 22

Hi Deborah,

For each record found on main SELECT, the subselect must be performed but it's crucial that this access is optimized by index due to huge size (450k) that can't have any performance chance for full scan.

1) Additional keys that need exist in index key (order isn't important but must be first keys of the index):

a) refryear, rldnr, docnr, rbukrs, docct

b) ryear, rldnr, docnr, rbukrs, docct <--- maybe this access can be helped by index of primary key

Please send to us the indexes of this table ZZGLV4A

2) Split into two NOT exists entries to help the database access:

AND NOT exists ( SELECT * FROM (l_item_tab) AS b
                 WHERE b~ryear    = a~ryear     AND
                       b~rldnr    = a~rldnr     AND
                       b~docnr    = a~docnr     AND
                       b~rbukrs   = a~rbukrs    AND
                       ( b~docct    = c_docct_u OR b~docct    = c_docct_t ) )
AND NOT exists ( SELECT * FROM (l_item_tab) AS b
                 WHERE b~refryear = a~ryear     AND
                       b~rldnr    = a~rldnr     AND
                       b~docnr    = a~docnr     AND
                       b~rbukrs   = a~rbukrs    AND
                       ( b~docct    = c_docct_y OR b~docct    = c_docct_x ) )

3) I don't know the process that you are using this SELECT. Look for the referenced records inside same table the only way ? Sometimes change the strategy is better than go behind "performance" issues.

Regards, Fernando Da Ró

former_member367551 · ‎2010 Apr 28

Hi Fernando,

Thanks for your inputs - I agree with your points too..

1) Additional keys that need exist in index key (order isn't important but must be first keys of the index):
a) refryear, rldnr, docnr, rbukrs, docct
b) ryear, rldnr, docnr, rbukrs, docct <--- maybe this access can be helped by index of primary key

I'm planning to improve performance by creating new secondary indexes and then using them in the queries.. The current queries (both the main and sub queries) are not hitting any index fields / using secondary indexes..

2) Split into two NOT exists entries to help the database access:
AND NOT exists ( SELECT * FROM (l_item_tab) AS b
                 WHERE b~ryear    = a~ryear     AND
                       b~rldnr    = a~rldnr     AND
                       b~docnr    = a~docnr     AND
                       b~rbukrs   = a~rbukrs    AND
                       ( b~docct    = c_docct_u OR b~docct    = c_docct_t ) )
AND NOT exists ( SELECT * FROM (l_item_tab) AS b
                 WHERE b~refryear = a~ryear     AND
                       b~rldnr    = a~rldnr     AND
                       b~docnr    = a~docnr     AND
                       b~rbukrs   = a~rbukrs    AND
                       ( b~docct    = c_docct_y OR b~docct    = c_docct_x ) )

Ok, I'll probably try this out too..

3) I don't know the process that you are using this SELECT. Look for the referenced records inside same table the only way ? Sometimes change the strategy is better than go behind "performance" issues.

I've tried thinking abt "changing strategies" to achieve the same logic also.. Will work on this further..

former_member367551 · ‎2010 Jun 23

The performance improved a little more by adjusting the SELECT ... WHERE query to match / fit suitably with one of the database table's secondary indexes.

Thanks for all your inputs and contributions.

Former Member · ‎2010 Jun 23

You should distribute points, if you set a questions to answered!

It is not ony about the points, actually the value of the forums is also measured by the points which are distributed.

It also helps to figure out which answer was most valuable.

Siegfried

Former Member · ‎2010 Jun 23

Correct Siegfried, but there's not much we can do about it. One thing I do when I see a post closed with unsatisfactory comments is to check to see if the poster created a new thread right after closing this one and look at the number of open posts. Quite often you'll find that it looks like the OP is simply closing the post to avoid having more than ten open posts.

In my view these people simply take from the forum without returning anything. In that case, I try to ignore any new posts.

I mention this without anyone in particular in mind.

Rob

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Regarding SELECT query