Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Frank-Martin
Product and Topic Expert
Product and Topic Expert
As of IBM Db2 11.5 MP4 FP0SAP, “Advanced Log Space Management” (ALSM) is now available for productive use in SAP systems. Even though there are a few remaining limitations (use in a HADR cluster and with a mirror log path are not possible), I’d like to show you how it works, what the benefits for SAP on Db2 customers could be, and maybe initiate further discussions around this new feature here in the community.

What follows is a test that I did using the technical preview of ALSM, which was already available with IBM Db2 11.5 GA, but the test remains essentially the same with MP4 FP0SAP.

How Does Db2 Log File Management Work?


Let’s begin with a short summary of how Db2 log files are managed and what can cause a log file full situation.

Db2 log files contain both undo and redo information of database changes. All information related to uncommitted database transactions needs to remain in the online log directory because it may be needed for rollback or crash recovery, or because SELECT statements with currently committed semantics may need to determine the uncommitted version of a row.

Productive SAP Db2 databases run with activated log archival. Db2 log file archival is automatically triggered by Db2 as soon as a log files becomes full. At this point in time, a copy of the log file is written to the archive media. However, a log file in the online log directory can only be removed if it doesn’t contain uncommitted information anymore and if all older log files have been released (see “Log File Allocation and Removal” in the IBM Knowledge Center for more information).

With infinite logging (parameter LOGSECOND = -1 ), Db2 releases log files early from the online log directory. However, SAP does not recommend using the infinite logging feature because rollback and crash recovery may need to retrieve log files from the log archive, which may take very long. If required log files in the log archive are not available anymore, this can lead to an unrecoverable database. For more information, see also SAP Note 1493587.

“Log File Full” Situations


A log file full situation (SQL error “SQL0964C The transaction log for the database is full.”) can bring an SAP system to a standstill, which is something you definitely want to avoid. A log file full situation can be caused by:

  1. One or more uncommitted database transactions that consume all available log space.

  2. A database transaction that consumes some log space but does not commit for a very long time while other database applications commit regularly but generate log entries and fill up the rest of the available log space.


As an SAP Db2 customer, you need to configure your available log space ((LOGPRIMARY + LOGSECOND) * LOGFILSIZ) large enough to serve your normal SAP application workload. On the other hand, the total available log space should be as small as possible to save disk space and to avoid long rollback times of a single SAP application or long crash recovery times. The optimal log space configuration depends on the SAP workload and the SAP application, and requires some monitoring and testing.

You can configure the database parameters MAX_LOG and NUM_LOG_SPAN to force off runaway database applications early. MAX_LOG is designed to prevent situations of the above-mentioned type (1) cause, while NUM_LOG_SPAN is designed to prevent situations of type (2). Both parameters need to be fine-tuned by the database administrator. If an application hits one of those boundaries, the database connection is forced off and SQL errors related to a connection loss (e.g. SQL1224N) will appear in ABAP short dumps and in the developer trace files. Unfortunately, these unspecific connection loss errors in the dev traces don’t make it immediately clear that a limit like MAX_LOG or NUM_LOG_SPAN has been reached. You need to manually relate the errors to the corresponding db2diag.log entries regarding MAX_LOG or NUM_LOG_SPAN, for example, using the connection's application handle, to understand if an application abortion was caused by NUM_LOG_SPAN or MAX_LOGS being reached. As you can imagine from this description, the analysis of such short dumps can be a time-consuming task. Therefore, proactive monitoring the db2diag.log for MAX_LOG or NUM_LOG_SPAN errors is a good idea. Also, the “Costly Transactions” screen in the DBA Cockpit can help you identify applications early that are about to hit the MAX_LOG or NUM_LOG_SPAN threshold.

How ALSM Can Help You


Advanced Log Space Management (ALSM) is designed to help you avoid log full situations of the above-mentioned type (2). ALSM allows you to configure your database with a lower total amount of log volume. To show how ALSM works, I’ll use a simple CLP test scenario on an empty database with two CLP applications that create a log full scenario of type (2).

Test Database


I created my test database using the following CLP script and configured it with only 10 small-sized primary log files:
> cat setup_database.sql

-- run with "db2 -td@ -vf setup_database.sql"
drop database TEST@

-- create a test database with 10 primary log file
create database TEST PAGESIZE 16 K@
update db cfg for TEST using LOGFILSIZ 40000 LOGPRIMARY 10 LOGSECOND 0@

-- set up active log dir and log archive directory
! rm -rf /db2/db2sid/log_dir
@
! mkdir /db2/db2sid/log_dir
@
! rm -rf /db2/db2sid/log_archive
@
! mkdir /db2/db2sid/log_archive
@
update db cfg for TEST using NEWLOGPATH /db2/db2sid/log_dir @
update db cfg for TEST using LOGARCHMETH1 DISK:/db2/db2sid/log_archive @

-- perform dummy backup to activate log archiving
backup database TEST to /dev/null @

-- create some test objects in the database
connect to TEST@
create table app1 ( t timestamp )@
create table status ( I integer )@
create table app2 like syscat.columns@
create variable app2_count integer default 0@

terminate@

Test Application 1


The purpose of the first application is to hold the oldest transaction in the active log files and to deliberately cause a “transaction log full” situation. Application 1 performs a single insert into a database table and does not commit the change.
> cat app1.sql

-- run with "db2 -td@ -vf app1.sql"
connect to TEST@
update command options using C OFF @
INSERT INTO APP1 VALUES( CURRENT TIMESTAMP )@


Test Application 2


The purpose of application 2 is to generate committed log volume while the first application holds the oldest log transaction. Application 2 processes 1000 packages and commits after each package. After a package fails or all packages have been processed, it shows the number of successfully processed packages, the contents of the log directories, and retrieves information about the application holding the oldest transaction.
> cat app2.sql

-- run with "db2 -td@ -vf app2.sql"
connect to TEST@

-- simulate some activity to generate log volume
-- periodically performing a COMMIT
BEGIN
DECLARE MESSAGE VARCHAR(100);
DECLARE STATUS INTEGER;
WHILE app2_count < 1000 DO
INSERT INTO app2 ( SELECT * FROM SYSCAT.COLUMNS );
DELETE FROM app2;
INSERT INTO status VALUES ( app2_count );
COMMIT;
-- sleep for 1 seconds between packages
-- to allow ALSM log extraction to kick in
CALL
DBMS_ALERT.WAITONE('SLEEP_BY_SECONDS_NOT_REAL_ALERT',STATUS,MESSAGE,1);
SET app2_count = app2_count + 1;
END WHILE;
END @

-- display how many packages have been successfully processed
values( app2_count )@

-- show contents of active log directory
!ls -la /db2/db2sid/log_dir/NODE0000/LOGSTREAM0000/
@

-- show contents of log archive directory
!ls -la /db2/db2sid/log_archive/db2sid/TEST/NODE0000/LOGSTREAM0000/C0000000
@

-- show some information about process holding the oldest database transaction
SELECT
T.APPLID_HOLDING_OLDEST_XACT,
T.LOG_EXTRACTION_DISK_SPACE_USED_TOTAL,
T.LOG_EXTRACTION_LAST_EXTRACTED_LOG,
C.CLIENT_PID,
C.COORD_MEMBER,
VARCHAR( U.WORKLOAD_OCCURRENCE_STATE, 20 ),
VARCHAR( C.APPLICATION_NAME, 30),
C.APPLICATION_ID,
VARCHAR( U.CLIENT_APPLNAME, 30),
T.TOTAL_LOG_AVAILABLE,
(T.TOTAL_LOG_USED * 100.)/(T.TOTAL_LOG_AVAILABLE + T.TOTAL_LOG_USED) PCT_USED,
U.UOW_LOG_SPACE_USED,
U.ROWS_MODIFIED,
U.LOG_EXTRACTION_DISK_SPACE_USED
FROM TABLE( MON_GET_TRANSACTION_LOG( -1 ) ) AS T
LEFT OUTER JOIN TABLE(MON_GET_UNIT_OF_WORK( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) U
ON T.MEMBER = U.MEMBER
LEFT OUTER JOIN TABLE(MON_GET_CONNECTION( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) C
ON T.MEMBER = C.MEMBER
WHERE T.APPLID_HOLDING_OLDEST_XACT IS NOT NULL
ORDER BY T.MEMBER ASC @

terminate@

Status Application


Application 2 will take approximately 10 to 20 minutes to finish. To monitor the number of processed packages while application 2 is running, you can run the following status application:
> cat status.sql

-- run with “db2 -td@ -vf status.sql”
connect to TEST@
select max(i) as processed_packages from status with ur@
terminate@

While application 2 is running, you can optionally use the status application in a third window to monitor the progress. You can also monitor the usage of log space using the following command:

db2 "select TOTAL_LOG_AVAILABLE from table(mon_get_transaction_log(-1)) as t"

You’ll notice that the available log space starts with 16518121 bytes and continuously decreases.

Test Run Without Advanced Log Space Management


To run the test scenario, first create the test database. Then start application 1 in a CLP window and leave it uncommitted. At the end, start application 2 in a separate CLP window. When it’s finished, it shows the following output:
> db2 -td@ -vf app2.sql -z app2.out

connect to TEST
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2SID
Local database alias = TEST

BEGIN

END

DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command.
During SQL processing it returned:
SQL0964C The transaction log for the database is full. SQLSTATE=57011

values( app2_count )
1
-----------
361
1 record(s) selected.

!ls -la /db2/db2sid/log_dir/NODE0000/LOGSTREAM0000/
total 1600084
drwxr-x--- 1 db2sid db2adm 264 Sep 5 07:50 .
drwxr-x--- 1 db2sid db2adm 26 Sep 5 07:49 ..
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:50 S0000001.LOG
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:51 S0000002.LOG

-rw------- 1 db2sid db2adm 163848192 Sep 5 07:56 S0000009.LOG
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:56 S0000010.LOG
-rw------- 1 db2sid db2adm 512 Sep 5 07:49 SQLLPATH.TAG
!ls -la /db2/db2sid/log_archive/db2sid/TEST/NODE0000/LOGSTREAM0000/C0000000
total 1440148
drwxr-x--- 1 db2sid db2adm 240 Sep 5 07:56 .
drwxr-x--- 1 db2sid db2adm 16 Sep 5 07:49 ..
-rw-r----- 1 db2sid db2adm 77824 Sep 5 07:49 S0000000.LOG
-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:50 S0000001.LOG

-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:55 S0000008.LOG
-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:56 S0000009.LOG

SELECT T.APPLID_HOLDING_OLDEST_XACT, T.LOG_EXTRACTION_DISK_SPACE_USED_TOTAL, T.LOG_EXTRACTION_LAST_EXTRACTED_LOG, C.CLIENT_PID, C.COORD_MEMBER, VARCHAR( U.WORKLOAD_OCCURRENCE_STATE, 20 ), VARCHAR( C.APPLICATION_NAME, 30), C.APPLICATION_ID, VARCHAR( U.CLIENT_APPLNAME, 30), T.TOTAL_LOG_AVAILABLE, (T.TOTAL_LOG_USED * 100.)/(T.TOTAL_LOG_AVAILABLE + T.TOTAL_LOG_USED) PCT_USED, U.UOW_LOG_SPACE_USED, U.ROWS_MODIFIED, U.LOG_EXTRACTION_DISK_SPACE_USED FROM TABLE( MON_GET_TRANSACTION_LOG( -1 ) ) AS T LEFT OUTER JOIN TABLE(MON_GET_UNIT_OF_WORK( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) U ON T.MEMBER = U.MEMBER LEFT OUTER JOIN TABLE(MON_GET_CONNECTION( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) C ON T.MEMBER = C.MEMBER WHERE T.APPLID_HOLDING_OLDEST_XACT IS NOT NULL ORDER BY T.MEMBER ASC

-- reformatted output:
APPLID_HOLDING_OLDEST_XACT 37
LOG_EXTRACTION_DISK_SPACE_USED_TOTAL 0
LOG_EXTRACTION_LAST_EXTRACTED_LOG 0
CLIENT_PID 71438
COORD_MEMBER 0
WORKLOAD_OCCURRENCE_STATE UOWWAIT
APPLICATION_NAME db2bp
APPLICATION_ID *LOCAL.db2sid.190905115007
CLIENT_APPLNAME CLP app1.sql
TOTAL_LOG_AVAILABLE 16475402
PCT_USED 98.989487119
UOW_LOG_SPACE_USED 259
ROWS_MODIFIED 1
LOG_EXTRACTION_DISK_SPACE_USED 0
1 record(s) selected.

terminate
DB20000I The TERMINATE command completed successfully.

As designed, application 2 hits a transaction log full situation (SQL0964C). It is able to process 361 work packages before all 10 log files of our test database are filled up. Log files 1-10 need to remain in the online log directory since application 1 did not commit the oldest transaction. Log files 1-9 were filled up by application 2 and have already been archived.

After application 2 encounters the SQL0964C error, only a small amount of available log space is left for other applications and all other applications that need to commit write activity on the database will also fail with SQL0964C until application 1 has committed or rolled back its database transaction. Keep in mind that in order to prevent the transaction log full situation, we did not configure NUM_LOG_SPAN on our test database to automatically force off application 1.

The monitoring statement at the end of our script shows application 1 as the application holding the oldest transaction and it also shows that application 1 itself generated only a small amount of log volume and changed only one row.

Test Run with Advanced Log Space Management


Now let’s see how our test scenario changes with Advanced Log Space Management (ALSM).

To enable ALSM, you need to set a registry variable and restart the database manager:
db2set DB2_ADVANCED_LOG_SPACE_MGMT=ON
db2stop force ; db2start

Then create a fresh test database and run application 1 and 2 as before. This time, the result of application 2 changes:
> db2 -td@ -vf app2.sql -z app2.out

connect to TEST
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2SID
Local database alias = TEST

BEGIN

END
DB20000I The SQL command completed successfully.

values( app2_count )
1
-----------
1000
1 record(s) selected.

!ls -la /db2/db2sid/log_dir/NODE0000/LOGSTREAM0000/
total 1600180
drwxr-x--- 1 db2sid db2adm 950 Sep 5 07:20 .
drwxr-x--- 1 db2sid db2adm 26 Sep 5 07:01 ..
drwxr-x--- 1 db2sid db2adm 0 Sep 5 07:17 LOGSTREAM0000
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:14 S0000019.LOG
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:15 S0000020.LOG

-rw------- 1 db2sid db2adm 163848192 Sep 5 07:20 S0000027.LOG
-rw------- 1 db2sid db2adm 163848192 Sep 5 07:20 S0000028.LOG
-rw------- 1 db2sid db2adm 512 Sep 5 07:01 SQLLPATH.TAG
-rw------- 1 db2sid db2adm 2152 Sep 5 07:07 X0000001.META
-rw------- 1 db2sid db2adm 343 Sep 5 07:07 X0000001_TID00000000000001E2.LOG
-rw------- 1 db2sid db2adm 2144 Sep 5 07:07 X0000002.META
-rw------- 1 db2sid db2adm 2144 Sep 5 07:07 X0000003.META

-rw------- 1 db2sid db2adm 2144 Sep 5 07:17 X0000021.META
-rw------- 1 db2sid db2adm 2144 Sep 5 07:17 X0000022.META
-rw------- 1 db2sid db2adm 2144 Sep 5 07:17 X0000023.TMP

!ls -la /db2/db2sid/log_archive/db2sid/TEST/NODE0000/LOGSTREAM0000/C0000000
total 4320292
drwxr-x--- 1 db2sid db2adm 672 Sep 5 07:20 .
drwxr-x--- 1 db2sid db2adm 16 Sep 5 07:01 ..
-rw-r----- 1 db2sid db2adm 77824 Sep 5 07:01 S0000000.LOG
-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:02 S0000001.LOG

-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:19 S0000026.LOG
-rw-r----- 1 db2sid db2adm 163848192 Sep 5 07:20 S0000027.LOG

SELECT T.APPLID_HOLDING_OLDEST_XACT, T.LOG_EXTRACTION_DISK_SPACE_USED_TOTAL, T.LOG_EXTRACTION_LAST_EXTRACTED_LOG, C.CLIENT_PID, C.COORD_MEMBER, VARCHAR( U.WORKLOAD_OCCURRENCE_STATE, 20 ), VARCHAR( C.APPLICATION_NAME, 30), C.APPLICATION_ID, VARCHAR( U.CLIENT_APPLNAME, 30), T.TOTAL_LOG_AVAILABLE, (T.TOTAL_LOG_USED * 100.)/(T.TOTAL_LOG_AVAILABLE + T.TOTAL_LOG_USED) PCT_USED, U.UOW_LOG_SPACE_USED, U.ROWS_MODIFIED, U.LOG_EXTRACTION_DISK_SPACE_USED FROM TABLE( MON_GET_TRANSACTION_LOG( -1 ) ) AS T LEFT OUTER JOIN TABLE(MON_GET_UNIT_OF_WORK( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) U ON T.MEMBER = U.MEMBER LEFT OUTER JOIN TABLE(MON_GET_CONNECTION( T.APPLID_HOLDING_OLDEST_XACT, T.MEMBER, 1 )) C ON T.MEMBER = C.MEMBER WHERE T.APPLID_HOLDING_OLDEST_XACT IS NOT NULL ORDER BY T.MEMBER ASC
-- reformatted output:
APPLID_HOLDING_OLDEST_XACT 1061
LOG_EXTRACTION_DISK_SPACE_USED_TOTAL 49663
LOG_EXTRACTION_LAST_EXTRACTED_LOG 22
CLIENT_PID 68212
COORD_MEMBER 0
WORKLOAD_OCCURRENCE_STATE UOWWAIT
APPLICATION_NAME db2bp
APPLICATION_ID *LOCAL.db2sid.190905110144
CLIENT_APPLNAME CLP app1.sql
TOTAL_LOG_AVAILABLE 0
PCT_USED 100.000000000
UOW_LOG_SPACE_USED 259
ROWS_MODIFIED 1
LOG_EXTRACTION_DISK_SPACE_USED 167
1 record(s) selected.

terminate
DB20000I The TERMINATE command completed successfully.

This time, with ALSM, application 2 did not encounter a transaction log full situation and was able to process all 1000 work packages.

ALSM avoided the transaction log full situation by extracting the uncommitted information from application 1 from the log files and putting it into separate files. The file X0000001_TID00000000000001E2.LOG contains this information. Then, all log files generated by application 2 were successfully archived and could be removed from the online log directory after extracting some meta data (files X0000001.META up to X0000022.META ). File X0000023.TMP is in process by the log extractor.

The log archive directory now contains many more log files compared to the non-ALSM test because application 2 was allowed to continue its work.

While ALSM is running, you may see a new Db2 thread using the command "db2pd -edu | grep loggx". You will also find new entries in db2diag.log indicating that ALSM is running.

Example:

2019-09-19-11.25.09.311865+120 E207391A705          LEVEL: Info
PID     : 6357918              TID : 35211          PROC : db2sysc 0
INSTANCE: db2c11               NODE : 000           DB   : C11
APPHDL  : 0-8                  APPID: *LOCAL.DB2.190919092509
AUTHID  : DB2C11               HOSTNAME: db6p05l003
EDUID   : 35211                EDUNAME: db2loggx (C11) 0
FUNCTION: DB2 UDB, data protection services, sqlpLogExtractionScanCB::loggxEnableExtractionScan, probe:1410
DATA #1 : <preformatted>
Log extraction under advanced log space management has been enabled for database.
Primary extraction path = ...
Mirror extraction path = Not set

2019-09-19-11.25.09.312266+120 E208097A733          LEVEL: Info
PID     : 6357918              TID : 35211          PROC : db2sysc 0
INSTANCE: db2c11               NODE : 000           DB   : C11
APPHDL  : 0-8                  APPID: *LOCAL.DB2.190919092509
AUTHID  : DB2C11               HOSTNAME: db6p05l003
EDUID   : 35211                EDUNAME: db2loggx (C11) 0
FUNCTION: DB2 UDB, data protection services, sqlpExtractionThrottler::loggxPrintThrottleInfo, probe:100
DATA #1 : <preformatted>
Log extraction throttling under advanced log space management has started for database.
m_throttleEnabled       = true
m_throttleLogging       = false
m_throttleSpace         = 80
m_throttleDiskAvailable = 2
m_throttleUOW           = 20

What Difference Does ALSM Make?


By extracting the uncommitted information from the online log files, ALSM manages to keep all information, which is necessary for a rollback, in the online log directory. This is a major difference to infinite logging where the uncommitted information is only available in the archived log files and requires log retrieval for a rollback.

After a rollback or commit of application 1, all extraction files associated with the application are automatically removed by Db2 from the online log directory. Thanks to ALSM, a rollback of application 1 is still fast because all information is still available in the online log directory.

The Db2 monitoring functions contain a number of new monitoring elements related to ALSM. In our test case, the new columns LOG_EXTRACTION_DISK_SPACE_USED_TOTAL, LOG_EXTRACTION_LAST_EXTRACTED_LOG, and LOG_EXTRACTION_DISK_SPACE_USED now show non-zero values.

My test was done with V11.5 GA. The zero number of TOTAL_LOG_AVAILABLE was an artifact in this version. You won't see this in version 11.5 MP4 FP0SAP or higher. I have to admit that with V11.5 GA I had to use a little trick to make my ALSM test work successfully. ALSM performs log extraction asynchronously after some log volume has been filled. Since my test database only has 10 small primary log files, application 2 fills them up very fast and might be faster than the log extractor. I therefore had to insert a SLEEP in my CLP script for application 2. However, this trick is only needed because of my small test database and artificial workload. On a database with a higher total log volume, log extraction should be fast enough.

Conclusion


As of Db2 11.5 MP4 FP0SAP, the new feature ALSM can help avoid transaction log full situations where applications hold the oldest log file and did not generate too much log volume themselves. ALSM worked fine in my test scenario and I hope it demonstrated how ALSM works.

ALSM can also help speed up rollback processing for applications that span a large log volume because the extracted rollback information is stored in dedicated files.

Also check out our video on ALSM.
12 Comments