on 2011 Jul 20 2:53 PM
When control is returned to the caller after a CHECKPOINT statement is issued by an EVENT or other connection, is the checkpoint process complete?
Or is there a chance some thread(s) or fiber(s) or frammiejammie(s) or winkelfrotz(es) are still busy doing checkpoint-ey things?
I am thinking the answers are "no" and "yes" because if you put a bullet in the head of a server (taskkill) immediately after doing a sackload of updates and executing an explicit CHECKPOINT, the subsequent recovery on startup takes some time.
Database recovery in progress Last checkpoint at Wed Jul 20 2011 13:34 Checkpoint log... ... recovery 25% complete ... recovery 47% complete ... recovery 73% complete ... recovery 97% complete Transaction log: foxhound1.log... Checkpointing... Starting checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43 Finished checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43 Recovery complete
When an explicit checkpoint statement completes, the checkpoint has completed and no other async stuff is queued up to be done afterwards. That doesn't mean that some other frammiejammie can't sneak in and modify the file immediately afterwards. Only on shutdown do we guarantee that the file is never modified after the checkpoint completes. In some older versions, it was pretty much guaranteed that the file would be modified immediately after the checkpoint completed if there were any transactions in progress; however, that particularly unfortunate behaviour went away in v10.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Checkpoints which are done automatically by the server are done asynchronously as a background operation, but if you execute an explicit CHECKPOINT then it will be done synchronously - when the statement returns the checkpoint operation will be complete.
At the time that you see the "Finished Checkpoint of <dbname>" message in the console log the checkpoint operation is complete - all work associated with the checkpoint is complete.
Remember that during recovery the operations are:
Also note that even though you may have just done a checkpoint just before you killed the server, the server must still go through all pages that reside in the checkpoint log to see if there are any pages that must be rolled back as part of step 1 above. If the checkpoint log was large due to a lot of activity on the server prior to going down then the number of pages in the checkpoint log could be large, and thus step 1 may take some time to read all of those pages. I suspect you had a large checkpoint log.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
With new enough database files, if the file is clean we don't traverse the entire checkpoint log. Instead, we verify up to 100 pages just as a sanity check to ensure none of them look like they need to be rolled back.
In this case, all database changes have been committed before the checkpoint... only one connection is doing any work (application database upgrade), and the last thing it did was commit, checkpoint, taskkill.
Complicating matters is... the connection doing the work is a DatabaseStart event... you probably didn't anticipate people running hour-long transactions and then shutting down the engine from inside a DatabaseStart event 🙂
Yes, I am planning to remove the taskkill because I've moved to 12.0.1 and the taskkill was introduced to stop 11.0.1 crashing and hanging on STOP ENGINE... but if it starts crashing and hanging again the taskkill will go back, hence my question about frammiejammies.
There are background tasks that clean up the database and can execute after the checkpoint before you get to your taskkill.
Further complicating matters is... this is embarrassing, to have forgotten to mention it... the database recovery process is working without a log. The absolutely gigantic transaction log from the previous run is explicitly deleted before the server is restarted.
Am I just lucky to be able the start the server at all? FWIW all of this has been working OK for months, from a data consistency point of view.
Yes, yes, no, no. Taskill, DEL the log, dbsrv12 without -f, and no backup performed.
Hence the "Am I just lucky?" comment. I am certainly going to change that logic BUT I am curious about two things: How can it possibly work? and the earlier question, why does the recovery take so long?
Perhaps the answers are related... and metaphysical in nature 🙂
Each dbspace plus the transaction log has what is called an "active" bit which means that it needs recovery. After your checkpoint, if the dbspace is modified without adding anything to the transaction log (eg. by the cleaner) then the system dbspace will be marked as "active" but the transaction log will not be. During recovery, we will need to rollback the system dbspace but probably won't complain about the missing log because it is not "active". That is probably why you are getting away with removing the log.
Why does it take so long? Well, how much time are we talking about here in absolute terms? Largely, all we do is a sequential scan of the checkpoint log and selectively throw some of those images into cache. Since you had just performed a checkpoint, I expect very few pages to be rolled back so recovery should take about as long as it takes to do a sequential scan of the checkpoint log.
User | Count |
---|---|
70 | |
10 | |
9 | |
6 | |
6 | |
6 | |
6 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.