cancel
Showing results for 
Search instead for 
Did you mean: 

Database Recovery: What Would You Do if...(2)

4,450

I'm borrowing Calvin's previous title and tag as it fits so well.

August 18-2011:

Client calls in and says that the application is losing connection to the database and only a reboot of their server fixes it. I login remotely and find multiple entries in the Windows event log regarding SQL anywhere unable to read from file, Bad Block on disk, and LSI Controller error, most of which correspond to unplanned shutdown events.

I promptly make a backup image using the Sybase Central GUI, copy it to my computer and call both my point of contact and their IT guy.

A week or so later, same point of contact at the client emails me about an unrelated application question. I answer it and ask if anything has been done to the server -- "No everything has been fine since we talked to you."

September 08 2011:

Same client calls in this morning. The database service will not start.
After multiple conversations with the IT guy I determine that the following happened (according to him): He was not able to get onsite until the 20th of August. He powered off the server and booted into a command-line environment (either linux or dos) and ran chkdsk on the drive (which evidently is a single drive plugged into an LSI RAID controller). When the chkdsk came back clean, he copied the contents of the database folder to another computer, rebooted the computer and left. A week to 10 days went by with no problems until Tuesday when the server shut down again.

This time, while onsite, he unplugged the drive from the RAID controller and plugged it directly into the motherboard and ran chkdsk again, which came back clean again, started the server in this configuration and left. After 4 hours the server crashed again and the drive now reports failures during chkdsk. The drive is mountable and bootable, so most of the data was copied to a new drive, but one of the files that fails to copy with a read error is of course the database file.

So after evaluating all of this, knowing I have a good backup from the 18th, I try to start the daily backup image files created by an event each night. All of these files come back with "This file is not a valid database file."

I ended up starting the 08-20-2011 copy he had made manually as it was the latest available that would start.

So my questions:

  1. What would you have done given the scenario above?
  2. Why do I have backup files that are invalid?
  3. Can I make an invalid file valid, and if so how?
  4. How can I make sure that future backup files are valid?

This is the text of the backup event running on a database starting as a service using SQLA 12.0.1.3324 -- dest is a string set as the backup location + what day it is

    BACKUP DATABASE DIRECTORY dest TRANSACTION LOG TRUNCATE;

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member

You didn't mention if you ran DBVALID on your backup on the 18th of August. If you did not, that would have been your first error - after seeing messages in the Windows event log related to disk I/O issues, it would have been prudent to not only take the backup but also validate its contents.

Point 2 - unless your performance requirements are high, I would enable CRC checking (page checksums) on all database pages so that (even) transient disk failures that occur on a page read from disk will cause an assertion. You should realize that most OS's and file systems can go with unreported file system and page corruption; see articles here and here.

Point 3 - You don't say how often you test your recovery strategy. It's fine to have a backup; it's another thing altogether to make sure that you can use it.

Point 4 - if you have all of the (untruncated) transaction logs, you can take an older (hopefully valid) backup and then apply all of the logs in order to get to the same state.

Point 5 - chkdsk doesn't check everything. Actually it checks very little. It does nothing to ensure that individual pages in a file are not corrupt from media failure.

Make copies of everything before you start any resolution.

Can you make the file valid? Possibly, but likely only with the help of technical support. To get the data out yourself, you'll need to be able to get to the point that a server can start the database. Once you get there, you can try to unload as much of the data that you can; it may be, for example, that other corruption has occurred with indexes, so the data pages themselves are fine.

0 Kudos

1) I did not run the DBVALID on the 18th, but I have since and it is valid. 2) I will definitely look into this 3) You are absolutely correct. Unfortunately our strategy has been to make the backup files and has not included validation and testing. That will be fixed. 4) It seems from those daily backups we have no transaction logs as they are all 0kb which as others have indicated seems to mean the backup did not finish. 5) Absolutely.

Former Member

Regarding (1), I can't say too much. If the backups do not have an off-site (or at least separate medium) location, I would suggest putting one in place.

For (2), there are a set of bytes in the database file that can mention a database is valid. These, internally, are only modified for certain events (e.g. upgrade db). If these are not reading correctly due to physical corruption then the possibility of (3) may not even be a desirable option.

Lastly, for (4), you can run a validation on the backup after is has been completed. Keep in mind with my suggestion for (1) a valid backup will only remain valid if the medium it is stored upon does not have a failure.


As for recovery options, if you have all of the transaction logs available (from the last successful backup), you may try to translate them with dbtran and apply them to the database. You can also apply just the logs to the database, but I would recommend dbtran first (as the logs themselves may be corrupt given the failures you have already noted). And, be sure to make a copy of your backup before applying the dbtran or transaction logs!

Hope this helps!

johnsmirnios
Employee
Employee

I would add that the likely cause of #2 is that the backup did not complete successfully. Backup images of database files are given a signature that identifies them as invalid when the backup starts. At the end of the backup process, that signature is rewritten to indicate that the file is valid.

0 Kudos

Yes I would agree that it seems the backups are not completing successfully as the transaction log file for each daily file is 0KB and empty. I mistakenly thought this was normal because of the TRUNCATE option but I have since learned that the TRUNCATE is for the active transaction log. Any ideas on diagnosing the backups not finishing issue?

johnsmirnios
Employee
Employee
0 Kudos

Well, you describe low-level IO errors so it's possible that the backups are failing due to an IO error. You can try to run a backup manually to see if it completes. Does your event capture errors and record in the database whether the statement completely successfully?

0 Kudos

That is possible, but while the failures were occurring I was able to take the manual backup on the 18th. The event does not currently do anything like that. What are the best ways to both capture errors and record that a backup completed successfully?

VolkerBarth
Contributor
0 Kudos

You may have a look at the maintenance plan facility: It does automatically create events for backup and/or validation. These events do contain exception handlers to report errors in the generated dbo.maint_plan_report table. Successful event runs are reported similarily.

This might be a good starting point to enhance your own event handlers.

Note that validation maintenance plans do validate the original database, not a copy of the backup - although the latter is much more reasonable as you have found out...

Former Member

One option when you run into the "This file is not a valid database file." is to use dbtran to convert the transaction log into a SQL command file which you could apply to your latest working database. But if your backup of the transaction log is 0 kB I guess your out of luck:-(

Best regards Ove Halseth