Advanced Log Space Management (ALSM) can help avoid transaction log full situations where applications rarely commit and hold the oldest log file for a very long time. However, ALSM does not guarantee that transaction log full situations never occur. Therefore, you might want to know how ALSM behaves under your SAP workload. In particular, you might, for example, want to know the following:
- Did ALSM kick in and extract log records from log files?
- How did the use of the Db2 transaction log develop over time (in percent)?
- How did the space consumption in the log directory develop over time?
- If the transaction log was full, what could be the reason for it?
In this blog post, I’ll show you how you can collect and analyze this kind of data using an monitoring shell script and a corresponding monitoring SQL statement. This blog post is a continuation of my previous blog post
Ad-Hoc Monitoring of Db2 Advanced Log Space Management, where I introduced a monitoring SQL statement for ad-hoc monitoring. Here I additionally use a script to collect data over a period of time to get an idea how ALSM works in my system.
Note: It’s important that you understand the concepts of ALSM before you go on reading. Therefore, if ALSM is new to you, I recommend that you read the following blog first:
Advanced Log Space Management (ALSM) as of Db2 for LUW Version 11.5
Introducing the Monitoring Shell Script
To answer the questions above, I will use the following:
- A SQL monitoring statement that selects the required ALSM metrics from Db2
- A monitoring shell script that calls the SQL monitoring statement over time and protocols all relevant data into a file.
We can then start first the monitoring shell script and afterwards our SAP workload. While the SAP workload is running, our monitoring shell script records every second all relevant data into a file.
I’ve already published a blog post called
Ad-Hoc Monitoring of Db2 Advanced Log Space Management. This blog post contains the SQL monitoring statement monALSM.sql, which we can also very easily use to retrieve the required ALSM metrics from Db2.
Here is my suggestion for such a monitoring shell script:
#!/bin/sh
log_dir="/db2/D01/log_dir/NODE0000/LOGSTREAM0000/" # adapt log dir
(
i=1
while [ $i -lt 7000 ] # monitor approx. 2 hours
do
dat=`date`
echo "Loop $i at $dat ----------------------------------------------------------------"
db2 -tf monALSM.sql
ls -l $log_dir
cntS=`ls -1 $log_dir | grep "S[0-9][0-9][0-9][0-9][0-9][0-9][0-9]" | wc -l`
cntX=`ls -1 $log_dir | grep "X[0-9][0-9][0-9][0-9][0-9][0-9][0-9]" | wc -l`
echo
echo "Count S-Log $cntS , count X-Log $cntX"
echo
df -k $log_dir
sleep 1
i=`expr $i + 1`
done
) | tee mon.log
If you want to use this monitoring shell script, you can copy it into a file, for example, called mon.sh. You need to adjust the variable log_dir at the beginning of mon.sh to the log directory path of your Db2 database.
Likewise, you can adjust how long mon.sh runs by setting the number of loop cycles. To do so, change the value of 7000 to the number that fits best to your requirements. 7000 means approximately two hours. The number of loop cycles is relevant if you start mon.sh in the background using nohup. If you run mon.sh in the foreground, you can always stop mon.sh using Ctrl-C, but you must ensure that mon.sh doesn’t stop too early during your monitoring session because the number of loop cycles is used up.
The output of mon.log is written into the file mon.log. Each start of mon.sh replaces mon.log and starts with an empty mon.log file.
Using the Shell Script
- Copy the monALSM.sql and mon.sh onto your Db2 database server into a folder that you created just for monitoring purposes.
- Start the morning shell script as follows:
- In the foreground: sh mon.sh
- In the background: nohup sh mon.sh &
- Start the SAP workload.
While mon.sh is running or after your SAP workload and mon.sh have been completed, you can check mon.log. The contents of mon.log can help to answer the questions that I mentioned in the introduction:
Question: Did ALSM kick in and extract log records from log files?
Use grep and the following command:
grep "Count S-Log" mon.log | grep -v "count X-Log 0"
The result shows all occurrences where at least one ALSM extraction log was found in the log directory. You can also open mon.log in an editor and search for “count X-Log 1”.
Question: How did the use of the Db2 transaction log develop over time (in percent)?
Enter the following command:
grep -e Loop -e log_used mon.log
The result shows the change of the use of the Db2 transaction log in percent over time and the associated time stamps.
Question: How did the space consumption in the log directory develop over time?
Again, use a grep command:
grep <log dir file system name> mon.log
The result shows the space consumption in the log directory over time.
Question: If the transaction log was full, what could be a reason for it?
This is indeed the most difficult question to answer, and only a simple grep-command won’t help here. Therefore, let’s approach this question by checking some of the data that we get in detail:
First start with the use of the Db2 transaction log. Do you see a use close to 100%?
ALSM kicks in at about 80% use of the Db2 transaction log and tries to maintain the use within this range. Use close to 100% might indicate that ALSM was not able to extract log records fast enough compared to the writes into the Db2 transaction log.
Let’s continue with the analysis of the size of extraction log files. Large extraction log files compared to active log files indicate that ALSM had to extract a large volume. It also indicates that the non-committing transaction performed many changes to the database.
Active log files have the naming convention “S<log number>.LOG” and extraction log files have the naming convention “X<log number>_TID<transaction id>_<chain>.LOG”. This allows us to use this command:
grep -e Loop -e _TID mon.log
The result shows the file size of all extraction log files over time. If the file size of the extraction log files is large, you have an indication that a non-committing transaction made many changes to the database. Large extraction volume is not the intended use case for ALSM.
A similar indicator that you can use for analysis is “data_extracted”. It shows the ratio between log records in active log files and the log records that have been extracted out of these files. Now use this command:
grep -e Loop -e data_extracted mon.log
Again, a large percentage value shows that something is not ideal because ALSM is not intended to get a relatively large extraction volume in comparison to active log files.
In these cases where you see that there’s a high extraction volume, you should consider increasing the value of LOGPRIMARY or LOGSECOND. You now understand that for this question why the transaction log has become full, there is no simple cookbook to find the answer. However, you now have the tool to come closer to the answer.
Summary
I’ve shown you how you can use my monitoring shell script to monitor ALSM while an SAP workload is running. The collected data enables you to answer various questions related to ALSM.
Analysis with mon.log allows you to understand what happened. What’s more, you can even think about your own grep-commands. Feel free to post your commands as comments to this blog post, so that all readers can benefit from your ideas.
Finally, I’d like to thank everyone who contributed to this blog post, especially my SAP colleague
sabine.reich.