cancel
Showing results for 
Search instead for 
Did you mean: 

Is the CHECKPOINT statement synchronous in 12.0.1.3298?

Breck_Carter
Participant
2,989

When control is returned to the caller after a CHECKPOINT statement is issued by an EVENT or other connection, is the checkpoint process complete?

Or is there a chance some thread(s) or fiber(s) or frammiejammie(s) or winkelfrotz(es) are still busy doing checkpoint-ey things?

I am thinking the answers are "no" and "yes" because if you put a bullet in the head of a server (taskkill) immediately after doing a sackload of updates and executing an explicit CHECKPOINT, the subsequent recovery on startup takes some time.

Database recovery in progress
    Last checkpoint at Wed Jul 20 2011 13:34
    Checkpoint log...
    ... recovery 25% complete
    ... recovery 47% complete
    ... recovery 73% complete
    ... recovery 97% complete
    Transaction log: foxhound1.log...
    Checkpointing...
Starting checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43
Finished checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43
Recovery complete

Accepted Solutions (0)

Answers (2)

Answers (2)

johnsmirnios
Employee
Employee

When an explicit checkpoint statement completes, the checkpoint has completed and no other async stuff is queued up to be done afterwards. That doesn't mean that some other frammiejammie can't sneak in and modify the file immediately afterwards. Only on shutdown do we guarantee that the file is never modified after the checkpoint completes. In some older versions, it was pretty much guaranteed that the file would be modified immediately after the checkpoint completed if there were any transactions in progress; however, that particularly unfortunate behaviour went away in v10.

MarkCulp
Participant

Checkpoints which are done automatically by the server are done asynchronously as a background operation, but if you execute an explicit CHECKPOINT then it will be done synchronously - when the statement returns the checkpoint operation will be complete.

At the time that you see the "Finished Checkpoint of <dbname>" message in the console log the checkpoint operation is complete - all work associated with the checkpoint is complete.

Remember that during recovery the operations are:

  1. rollback database pages to the last checkpoint
  2. replay all operations in the transaction log that occurred after the checkpoint
  3. rollback all transactions that were not committed when the server went down.

Also note that even though you may have just done a checkpoint just before you killed the server, the server must still go through all pages that reside in the checkpoint log to see if there are any pages that must be rolled back as part of step 1 above. If the checkpoint log was large due to a lot of activity on the server prior to going down then the number of pages in the checkpoint log could be large, and thus step 1 may take some time to read all of those pages. I suspect you had a large checkpoint log.

johnsmirnios
Employee
Employee

With new enough database files, if the file is clean we don't traverse the entire checkpoint log. Instead, we verify up to 100 pages just as a sanity check to ensure none of them look like they need to be rolled back.

MarkCulp
Participant
0 Kudos

Thanks John... I had forgotten that you put in that optimization!

Breck_Carter
Participant
0 Kudos

In this case, all database changes have been committed before the checkpoint... only one connection is doing any work (application database upgrade), and the last thing it did was commit, checkpoint, taskkill.

Complicating matters is... the connection doing the work is a DatabaseStart event... you probably didn't anticipate people running hour-long transactions and then shutting down the engine from inside a DatabaseStart event 🙂

Yes, I am planning to remove the taskkill because I've moved to 12.0.1 and the taskkill was introduced to stop 11.0.1 crashing and hanging on STOP ENGINE... but if it starts crashing and hanging again the taskkill will go back, hence my question about frammiejammies.

johnsmirnios
Employee
Employee

There are background tasks that clean up the database and can execute after the checkpoint before you get to your taskkill.

Breck_Carter
Participant
0 Kudos

Further complicating matters is... this is embarrassing, to have forgotten to mention it... the database recovery process is working without a log. The absolutely gigantic transaction log from the previous run is explicitly deleted before the server is restarted.

Am I just lucky to be able the start the server at all? FWIW all of this has been working OK for months, from a data consistency point of view.

VolkerBarth
Contributor
0 Kudos

Breck exploring the limits, again:)

johnsmirnios
Employee
Employee
0 Kudos

Are you saying you do a taskkill and then forcibly remove the log yourself? Are you using "-f" to start up afterwards? I would recommend only removing the log after a clean shutdown. Or perhaps there's a form of backup that does nothing other than truncate the log?

Breck_Carter
Participant
0 Kudos

Yes, yes, no, no. Taskill, DEL the log, dbsrv12 without -f, and no backup performed.

Hence the "Am I just lucky?" comment. I am certainly going to change that logic BUT I am curious about two things: How can it possibly work? and the earlier question, why does the recovery take so long?

Perhaps the answers are related... and metaphysical in nature 🙂

johnsmirnios
Employee
Employee

Each dbspace plus the transaction log has what is called an "active" bit which means that it needs recovery. After your checkpoint, if the dbspace is modified without adding anything to the transaction log (eg. by the cleaner) then the system dbspace will be marked as "active" but the transaction log will not be. During recovery, we will need to rollback the system dbspace but probably won't complain about the missing log because it is not "active". That is probably why you are getting away with removing the log.

Why does it take so long? Well, how much time are we talking about here in absolute terms? Largely, all we do is a sequential scan of the checkpoint log and selectively throw some of those images into cache. Since you had just performed a checkpoint, I expect very few pages to be rolled back so recovery should take about as long as it takes to do a sequential scan of the checkpoint log.