This blog post is a bit of a sequel to an article I published last year about where transaction SAT (Performance Analysis) can be useful. While last year's blog post was general in nature, this one is about a specific example where the performance analysis led to code-changes and eventually considerably shorter run times.
A few weeks ago I happened upon a long running job in one of our productive environments - and I mean really looooong running as it took about 27 hours to run to completion from start at 5 in the morning of Day 1 to finish sometime between 9 and 11 on Day 2. One "advantage" of such long running jobs is, that you can easily take a peek at where it's whiling away its time by going to transaction SAT after identifying the server the process is running on with the help of SM51. As I'm always curious of what is causing such long runtimes, this is what I immediately did.
Instead of having SAT execute a specific program or transaction I made use of the option "In parallel session". It also helps with the analysis to check the box to "Determine Names of Internal Tables":
After clicking "Switch On/Off" the currently active processes are displayed with the interesting one most likely sticking out like a sore thumb by the already accumulated runtime:
Position the cursor in the relevant line and activate the measurement for 1 minute or 2 (more usually isn't needed to get a first impression without creating too large an output):
After deactivating the measurement again, you'll be returned to the SAT evaluation. Here is an example of what I saw when I first did this for the long running job and after sorting the Hitlist descending on Gross%:
The number of hits gave an indication of how many items were being processed and the gross percentage shows where most of the time was spent, namely within a couple of loops over internal tables (and yes, I had forgotten to follow my own advice to determine the names of internal tables). The "gross microseconds" looked rather interesting to say the least given how high these were for processing not even 17,000 items. What I really like about SAT is that it gives immediate access to the underlying code.
The program in question extracts a large number of material master items from MARA and related data from several other tables. It offers a choice of including 48 (full) or just 11 columns (reduced) in the output. The reduced option was added as an afterthought and it turned out that when it was added it was unfortunately only applied to the output but not the data retrieval and processing logic. A quick initial fix therefore was to switch off the optional selection of classification data via the variant when only the reduced output was requested for which this data is irrelevant. As this had accounted for about 40% of the overall runtime this simple fix brought the runtime down to about 7 hours. Still rather long but obviously a lot better than 27!
While checking the information provided by SAT, I had identified several places where the existing code could be improved and it didn't take long to get the go-ahead to apply whatever I could within about 2 days. The remainder of the blog post will focus on some areas of the code where I subsequently applied some fairly simple fixes.
The very first thing I did was to split the processing logic to have one "thread" to just provide what is needed for the reduced field list (even if hundres of thousand line items all told) and another to get all the data for the full field list (usually just needed for some items in one go). Doing this had the added benefit of breaking up the code into smaller routines which will now give a much more detailed view in the call monitor data (SCMOND) to see which of the data is even regularly retrieved, i.e. still needed.
I also changed some instances where the "FOR ALL ENTRIES" construct could be eliminated in favor of simply including the needed data with a JOIN in the main SELECT. Where this can safely be done can either be easily seen in the code or an ATC-check might provide helpful pointers.
While applying these types of changes, I also kept an eye on SAT which even in the Dev-system with just a bit over 20,000 items in the master data showed improvements. There however remained on obvious pain point, namely the LOOP over an internal table called T_REPLACED which made up over 90% of the runtime according to the SAT hit list:
Let's take a look at the code:
LOOP AT t_tab ASSIGNING <t>.
LOOP AT t_replaced INTO replaced_wa
WHERE matnr = <t>-matnr.
IF <t>-z_vmat IS INITIAL.
MOVE: replaced_wa-bismt TO <t>-z_vmat,
replaced_wa-artkz TO <t>-artkz.
MOVE 'X' TO <t>-mehrfach.
Looks rather innocent, doesn't it? However, the first issue is that we have a "loop within a loop situation" which is not ideal to begin with, but is especially bad if you are dealing with a program with hundreds of thousands of items in T_TAB. In addition, table T_REPLACED also contains many entries which makes the WHERE clause rather problematic performance-wise. As soon as the first - in this case - match on MATNR is found, the loop will continue from that index to the very last entry in the internal table, often not finding anything of interest. T_REPLACED was just defined as a standard table.
I replaced this code with the somewhat more complex combination of READ TABLE with a "WHILE MATNR doesn't change" construct, making sure that T_REPLACED was properly sorted:
READ TABLE t_replaced INTO replaced_wa
WITH KEY matnr = p_matnr
IF sy-subrc EQ 0.
DATA(tabix_save) = sy-tabix.
WHILE replaced_wa-matnr EQ p_matnr.
IF p_z_vmat IS INITIAL.
p_z_vmat = replaced_wa-bismt.
p_artkz = replaced_wa-artkz.
p_mehrfach = abap_true.
ADD 1 TO tabix_save.
READ TABLE t_replaced INTO replaced_wa
IF sy-subrc NE 0.
And yes, I realized while creating this write-up that I might have done this a lot more easily by just changing T_REPLACED to a sorted table. I however didn't touch all the data definitions because I only had a limited time budget to apply some quick fixes and the changes are all tested now and will go into production soon. If you think that I could have left the old loop logic and only change T_REPLACED to a sorted table, please let me know in a comment and I'll make a note of that to apply with the next change of the program.
As of this write-up, the program has been successfully tested and produces the same results (a download file with about 79MB in size) as it did before my partial re-write. The runtime is also considerably faster than it was to begin with and even after the inital quick fix to no longer get the classification data if it's not needed. Instead of spilling the beans immediately, I'd like to give you a chance to guess how much time the program now needs to create the reduced output, which - to recap - took 27 hours when I noticed the issue. I'll tell you in a couple of days who came closest (sorry, no prizes)!