cancel
Showing results for 
Search instead for 
Did you mean: 

crash recovery of productive db very slow

Former Member
0 Kudos

We had to shutdown a productive database with db2_kill, because it couldn't be stopped normally and had problem with a full FAILARCHPATH (After TSM server had problems, the archiving to TSM has not been successfully any more, even after TSM Server was up again: We had this problems before....)

The crash recovery takes very long. Sometimes even db2 list utilities <show details> seems to hang.

With db2pd -everything I can see the progress of the crash recovery:

Database Partition 0 -- Database PC1 -- Active -- Up 0 days 01:57:14 -- Date 05/07/2008 11:34:59

Recovery:

Recovery Status 0x00000C01

Current Log S0003363.LOG

Current LSN 061F2B330DBA

Job Type CRASH RECOVERY

Job ID 1

Job Start Time (1210145904) Wed May 7 09:38:24 2008

Job Description Crash Recovery

Invoker Type User

Total Phases 2

Current Phase 1

Progress:

Address PhaseNum Description StartTime CompletedWork TotalWork

0x000000020018E580 1 Forward Wed May 7 09:38:24 2008 786766439 bytes 1998253346 bytes

0x000000020018E670 2 Backward NotStarted 0 bytes 1998253346 bytes

So the db has now finished approx 1/3 of the bytes of the forward phase and then also have the backward phase!

In the db2diag.log there are no more entries after beginning of the crash recovery of 09:38.

We have move one logfile from the FAILARCHPATH directory (which was 100% full) to a different directory to be sure, that the slow crash recovery has nothing to do with the full FAILARCHPATH.

The log_dir directory has 20 logfiles (LOGPRIMARY+ LOGSECOND) in it (more could not be allocated there because the log_dir is sized according to the LOG-Parameters)

Parameter UTIL_HEAP_SZ = 150.000

Does anybody have an idea, why the crash recovery is so slow ?

Kind regards,

Uta

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

Hello Uta,

crash recovery with 780MB in two hours is really slow. Have you checked that all logs necessary for recovery are available in the log directories? I had only one recovery which was comparable slow. But our system has had missed its log files and they has to be restored one by one for the recovery. May be you could check the TSM logs.

Regards

Ralph Ganszky

Former Member
0 Kudos

Hello Ralph,

the needed logfiles were all there and we didn't need to restore any logfiles from tsm (the "active" logfiles, which are needed for crash recovery should always reside in the log_dir...)

At 2008-05-07-14.17.07.357544 crash recovery was completed successfully.

At 2008-05-07-13.56.41.297552 the db has started archiving to tsm again:

ADM1844I Started archive for log file "S0003329.LOG".

According to dba collegues the crash recovery were only 50 % finished and then suddenly everything was finished. Since the "db2 list utilities" takes forward and backward phase into account for percentage, I assume that the backward phase was very fast.

The dba collegues have also recognized, that in the log_dir there were logfiles which were archived to tsm already. So they moved them out of the log_dir, and additional logfiles could be allocated (Before no add. logfile could be allocated). I couldn't say, if this was the reason, why the recovery was finished then afterwards.

The only problem is, that the database doesn't want to archive Logfiles S0003329- S0003350. Strange is also, that logfile 3329 was archived to the FAILARCHPATH yesterday successfully,


2008-05-06-12.27.10.316403+120 E4284459A420       LEVEL: Warning
PID     : 3907                 TID  : 1           PROC : db2logmgr (PC1) 0
INSTANCE: db2pc1               NODE : 000
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogFile, probe:3170
MESSAGE : ADM1846I  Completed archive for log file "S0003329.LOG" to 
          "/db2/PC1/log_archive/db2pc1/PC1/NODE0000/C0000009/" from 
          "/db2/PC1/log_dir/".

and now the db searches in the log_dir:


2008-05-07-13.57.02.525715+120 E25224816A315      LEVEL: Warning
PID     : 28182                TID  : 1           PROC : db2logmgr (PC1) 0
INSTANCE: db2pc1               NODE : 000
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogFile, probe:3108
MESSAGE : ADM1844I  Started archive for log file "S0003329.LOG".

2008-05-07-13.57.02.526949+120 I25225132A364      LEVEL: Error
PID     : 28182                TID  : 1           PROC : db2logmgr (PC1) 0
INSTANCE: db2pc1               NODE : 000
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogVendor, probe:1630
RETCODE : ZRC=0x860F000A=-2045837302=SQLO_FNEX "File not found."
          DIA8411C A file "" could not be found.

2008-05-07-13.57.02.527866+120 E25225497A367      LEVEL: Warning
PID     : 28182                TID  : 1           PROC : db2logmgr (PC1) 0
INSTANCE: db2pc1               NODE : 000
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogFile, probe:3150
MESSAGE : ADM1848W  Failed archive for log file "S0003329.LOG" to "TSM chain 9" 

2008-05-07-13.57.02.528352+120 I25225865A370      LEVEL: Error
PID     : 28182                TID  : 1           PROC : db2logmgr (PC1) 0
INSTANCE: db2pc1               NODE : 000
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogFile, probe:3160
MESSAGE : Failed to archive log file S0003329.LOG to TSM chain 9 from 
          /db2/PC1/log_dir/ with rc = -2045837302.

and this was none of the logfiles, which the collegue moved out of log_dir.

Has anybody seen the situation that the db couldn't archive from failarchpath to TSM after failure. We don't want to control every FAILARCHPATH after TSM-Failures....

Kind regards,

Uta

Former Member
0 Kudos

Hello Uta,

yes I've just seen this in the following situation:

- TSM is not available

- DB2 could not archive into TSM, it archived the logs to the FAILARCHPATH defined in DBCFG

- Restart database

- TSM is available again

If you restart the database (with deactivate and activate) while TSM is not available and FAILARCHPATH

is used to archive logs, DB2 does not archive the log(s) from FAILARCHPATH into TSM.

Have you restarted the database when TSM was unavailable?

Kind regards,

Gerhard

Edited by: Gerhard Paulus on Jun 29, 2008 12:55 PM

Former Member
0 Kudos

Hi,

DB2 should archive log files from FAILARCHPATH automatically to TSM, if TSM server comes back online. No database restart should be required. There is one APAR that hinders the log file archiving of log file in FAILARCHPATH, when the database is restarted: IY99682.

Regards, Jens