on 2014 Sep 02 5:19 PM
The objective is to have a backup of a relatively large (30gb) database that can be restored rapidly, with a very narrow window of potential data loss, as this database is in use 24x7x365. This should all be documented in such a manner that we can assume the entire IT team was beamed up by aliens and the janitor is responsible for bringing the database back up.
There is much helpful information on this site. I've read enough now that I've started to second guess the plan. What I've outlined below seems to work just fine, but I'm sure someone can advise a better way. Database mirroring is coming, but not until the first part of next year.
We have 'ServerA' which is running the database, and 'ServerB', which will be the backup target. ServerB is maintained at the same patch level, etc as ServerA. Both servers are 64bit Windows.
On ServerB there is a batch file that uses DBBACKUP to create a full backup. For now it runs once a week. It's a pretty straight forward commnand:
dbbackup -c "DSN=MyDSN;" -x -y c:\\backup\\full
Another batch file runs every 30 minutes to grab the transaction log
dbbackup -c "DSN=MyDSN;" -t -y c:\\backup\\incr
Should we need to use the backup -- which implies that all three of my VMWare HA hosts are down and the redundant iSCSI sans have failed:
- Make new "working folder" for the recovery server and switch to it
- copy files from c:\\backup\\full to it
- apply the transaction log from the full backup
DBSRV16 mydatabase.db -a mydatabase.log- Exercise patience until the server shuts down
- Copy mydatabase.log from c:\\backup\\incr
- Repeat steps 3 & 4
- miller time
Request clarification before answering.
> a very narrow window of potential data loss, as this database is in use 24x7x365
> entire IT team was beamed up by aliens and the janitor is responsible for bringing the database back up.
Those two requirements are reasonable for a high availability setup, not backup and recovery... not even live backup. Experience has shown that backup is easy but recovery is hard because you can never predict exactly what the problem's going to be.
You always need a backup and recovery plan, and it should be tested, but don't expect it to provide high availability... for that you need... wait for it :)... high availability.
FWIW a dbbackup -l live backup command can be wrapped in a GOTO start loop that repeatedly runs the command if it fails. The usual cause for failure is loss of contact with the master database server... if it is intermittent then the GOTO loop will let dbbackup reconnect and continue on... if the loss of contact is permanent then the GOTO loop will continue until you kill it, which you will probably do because it's time to start the secondary server and apply the live log.
All that live backup stuff is possible, but for V16 it's the Amish way of doing high availability... no offense to the Amish, or to live backup, or SQL Remote, or any of The Old Ways... supported by all but embraced by few.
Remember that when the live backup is used for failover you have to (a) start the second server and (b) apply the log... and if it's been a while since a full backup has been taken, applying the live log may be time consuming.
Plus, it's harder and more time consuming to practice a live log failover than an HA failover. Janitor-ready documentation is a real challenge, far beyond my abilities (I tried, with V9, failed).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I may have drawn an inaccurate picture. The primary server is running in VMWare with HA, connected to redundant iSCSI SANs, with geographic separation of redundant devices. So the primary mission for this exercise is protection from 1) catastrophic hardware failure and 2) severe database corruption.
The 'janitor can do it' factor comes from ISO auditor who believe that everyone can do any job in the factory given good enough instructions. So mostly it's an exercise to make sure a box can be checked off.
Please post a new question... a backup and recovery plan for a geographically separate HA setup is an entirely different proposition. In particular, you do not want any of your regular backup activities to interfere with regular HA operation or even HA failover.
Is it possible to check the box without having an actual janitor demonstrate recovery from catastrophic failure? Without testing, recovery is guaranteed to fail... not 99.9%, but 100%.
It's ISO, so "checking the box" mostly means that we show that we have the instructions available, and have confirmed that if followed, they will work -- janitor not required. Even as we speak, I am doing testing and documentation of the recovery process.
In this instance, I'm using 'HA' in the VMWare context, which will fail a VM over from one host to another. This we have tested (both intentionally and not so much), and it works well -- the most users noticed was a 10 second lag on a couple screens on the ERP system. 'Geographically separated', for now, means the hardware is at the other end of the building, literally a 1/3 of a mile away, with gigabit fiber between.
In reality, the process I am documenting is for a "the building burned down" scenario. Since we are a plastics manufacturer, in that case I'll have more important issues than if I have an exact up to the minute inventory.
Aside from the discussion whether a live backup may be helpful or not, I would strongly recommend to have a copy of the previous backup available, either by copying the backup files to a second location or by using different backup folders in round-robin fashion (aka "backup generations"). The reason is that your current plan seems to use only one backup location for the full database backup and the incremental log, so in case the machine would somehow "crash" during the log backup you might have both the active log and the backup log corrupted, and the previous backup log file would already have been overwritten...
You might have a look at Breck's articles on that topic, such as Rotating Database Backups Revisited.
In case you happen to have more then one log backup for the last full backup, and those log backups have different names (say, by using DBBACKUP -n or one of the "rename log" variants), you can simply the following steps:
3. apply the transaction log from the full backup DBSRV16 mydatabase.db -a mydatabase.log 4. Exercise patience until the server shuts down 5. Copy mydatabase.log from c:\\backup\\incr 6. Repeat steps 3 & 4
by using the -ad or -ar database switch to apply the all required transaction logs in one step. Note: That would not work currently as (in my understanding) the log from the full backup and from the incremental backup has the same name, so they cannot exist in the same directory.
Switch -as might be helpful if you want to have the database server available after recovery without a database restart.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Having a copy of the previous backup is a good idea. It was always part of the plan, but I am kind of "backing in" to this, making sure I get the right files to perform the recovery. Then I'll figure out many redundant copies I want. Because we want to capture the current log file every 30 minutes, using the "rename log" option is impractical from a housekeeping stand point.
I think today I will try adapting Breck's rotating batch file to do both the full and the incremental backups.
OK, then you could adapt the steps slightly: Just rename the latest log backup while you copy it into the directory with the database to be restored, such as
COPY c:\\backup\\incr\\mydatabase.log \\WorkingFolder\\mylatestdatabase.log
and then you can directly run "DBSRV16 mydatabase.db -ar" and let the system decide whether it has to apply from both the original and the incremental log file.
> I would strongly recommend to have a copy of the previous backup available
FWIW that is a requirement... the live log is just that, a transaction log, and when you fail over you have to have a database to which you can apply the live log file... which would be the previous full backup.
See the following topic Restarting a database from a live backup...
"Copy the full backup transaction log file and the live backup transaction log to a directory where they can be applied to the backup copy of the database file."
User | Count |
---|---|
61 | |
8 | |
7 | |
6 | |
6 | |
4 | |
4 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.