cancel
Showing results for 
Search instead for 
Did you mean: 

Snapshot Backups on HP EVA SAN

Former Member
0 Kudos

Hi everyone,

We are implementing a new HP EVA SAN for our SAP MaxDB Wintel environment. As part of the SAN setup we will be utilising the EVAs snapshot technology to perform a nightly backup.

Currently HP Data Protector does not support MaxDB for its "Zero Downtime Backup" concept (ZDB), thus we need to perform LUN snapshots using the EVAs native commands. ZDB would have been nice as it integrates into SAP and lets the DB/SAP know when a snapshot backup has occurred. However as I mentioned this feature is not available on MaxDB (only SAP on Oracle).

We are aware that SAP supports snapshots on external storage devices as stated in OSS notes 371247 and 616814.

To perform the snapshot we would do something similar (if not exactly) like note 616814 describes as below:

To create the split mirror or snapshot, proceed as follows:

dbmcli -d <database_name> -u < dbm_user>,<password>

util_connect < dbm_user>,<password>

util_execute suspend logwriter

==> Create the snapshot on the EVA

util_execute resume logwriter

util_release

exit

Obviously MaxDB and SAP are unaware that a "backup" has been performed. This poses a couple of issues that I would like to see if anyone has a solution too.

a. To enable automatic log backup MaxDB must know that it has first completed a "full" backup. Is it possible to have MaxDB be aware that a snapshot backup has been taken of the database, thus allowing us to enable automatic log backup?

b. SAP also likes to know its been backed up also. Earlywatch Alert reports start to get a little upset when you don't perform a backup on the system for awhile.

Also DB12 will mention that the system isn't in a recoverable state, when in fact it is. Any work arounds available here?

Cheers

Shaun

View Entire Topic
lbreddemann
Active Contributor
0 Kudos

Hi Shaun,

ad a)

You have to perform an initial complete data backup here - no way to avoid it. If you don't want to keep it - throw it away afterwards.

ad b)

Also this is a point where you (currently) have to decide to either use the SAP approach (only use the supported and predefined processes as these are the only ones that are captured in CCMS) OR you rely on your own backup processes and the monitoring of it.

Since the EWA report is just a collection of warnings and hindsights that need to checked and interpreted in any case you can always say the EWA report is OK unless the missing backup warning is the only warning.

Anyhow, what makes me really wonder is how the rest of your backup strategy looks like.

How do you automate the checking of the snapshot-backup?

How do you automate the consistency check of the backed up database?

The Snapshot-Backup approach you're using obviously offers the big oppertunity to create a second copy of the database (from the backup taken), open the database and perform a consistency check without putting the load to the production machine.

Is this part of your process?

KR Lars

Former Member
0 Kudos

Hi Lars,

So for question a - just a backup to NUL (on windows) to get around the restriction would do?

And for b - I agree that I would like to follow a supported backup methodology, and although we are looking at using snapshots, we won't rule out tape backup. At a minimum we would perform at least a weekly backup to tape (most likely we would do more than 1 a week to tape).

I am assuming that SAP "somehow" supports MaxDB and snapshots on an EVA4400, as I found this promotional video with both HP and SAP MaxDB people speaking about the snapshot functions ..etc.. :

http://h30423.www3.hp.com/index.jsp?fr_story=f752332210f499cb6619bd142f5f7a9cdca72a96&fr_chl=d9138bf...

I know its a promotional video but a customer like myself watches that and assumes that SAP is ok with it given they are blowing the EVAs trumpet with MaxDB.

As for checking the snapshot consistency, we would periodicaly do a system refresh of our Test systems (quartely) using a snapclone, and then snapshot.

We would also perform our database consistency checks on the weekend to avoid the high load times during the core working week hours.

And after a system refresh we would perform a consistency check to verify that what was restored is sound (from a structure perspecitive).

And as I mentioned, we aren't eliminating tape completely, just reducing its use.

Our goal with snapshots was to replace the majority of our daily backups with a snapshot, but still perform at a minimum our weekly, monthly, and yearly backups to tape.

Cheers

Shaun

lbreddemann
Active Contributor
0 Kudos

Hi Shaun,

> So for question a - just a backup to NUL (on windows) to get around the restriction would do?

No I wouldn't - why would I? Throwing away a good backup that I can use at least as long as I have the followup Log-Backups is something I wouldn't do. Why not just keep this backup?

Since you can create the initial data backup while the database is fully operational you don't loose any time with that.

> And for b - I agree that I would like to follow a supported backup methodology, and although we are looking at using snapshots, we won't rule out tape backup. At a minimum we would perform at least a weekly backup to tape (most likely we would do more than 1 a week to tape).

That's a pretty wise choice! In that case the initial data backup will be just the first of those weekly backup.

> I know its a promotional video but a customer like myself watches that and assumes that SAP is ok with it given they are blowing the EVAs trumpet with MaxDB.

Well - nobody denies that SAP is "ok" with that.

It's just a difference to either be "ok" with something (meaning here: you can use MaxDB features in a way that the EVA snapshots will work for a backup) compared to the simple, straightforward way that got implemented to a point-and-click-level in SAP CCMS.

Technically your idea on using snapshots is likely to work, but as I've already written, you will have to make up the monitoring yourself.

> As for checking the snapshot consistency, we would periodicaly do a system refresh of our Test systems (quartely) using a snapclone, and then snapshot.

Sorry to tell - but that's far to late.

Assume you have a corruption in the database (sooner or later you will have one).

The primary way out of such a situation is to recover a backup in order to avoid dataloss.

Now assume you just went into such a corruption with one of the SAP transactions you don't use that often.

Which backup can you use and know for sure that the corruption is not in there? The storage snapshot does not actually read database blocks and does not perform any consistency checks. Only a backup through the MaxDB Kernel will do.

Anyhow, it is still possible that corruptions get into a backup.

Now what? Test and see if the problem is present in all backups you still have log backups for?

You aren't serious, are you?

A much better procedure is (also not included in the point-and-click actions of CCMS):

- perform a backup

- perform a recovery

- perform consistency check

If you get an "OK" for all three steps, only then you know that:

- your data has been successfully backed up

- can really be recovered

- once it has been recovered, the database will be corruption free

I've already written about this ... [Not my fault - whose then?|https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/8847] [original link is broken] [original link is broken] [original link is broken];

> We would also perform our database consistency checks on the weekend to avoid the high load times during the core working week hours.

Lucky you - no working hours on the weekend... BTW: do your large reporting batch jobs also take the weekend off?

> And after a system refresh we would perform a consistency check to verify that what was restored is sound (from a structure perspecitive).

Good idea - but you would check far to seldom.

> Our goal with snapshots was to replace the majority of our daily backups with a snapshot, but still perform at a minimum our weekly, monthly, and yearly backups to tape.

I fully agree and I really like the opportunities you get by that.

But keep in mind that your data backups can only be used together with a non-interrupted chain of log backups. So although you can use the database after a recovery of a year old data backup your business might not be too happy to find the database in a state of a year ago...

KR Lars

Former Member
0 Kudos

Hi Lars,

Thanks for the feedback. I appreciate your input, and noted, we will perform a standard DB backup each night to tape (as well as snapshot), to not only give us a "double-up" of possible protection, but also to verify the db blocks / consistency checks that you point out so rightly are extremely important.

Well - nobody denies that SAP is "ok" with that.

It's just a difference to either be "ok" with something (meaning here: you can use MaxDB features in a way that the EVA snapshots will work for a backup) compared to the simple, straightforward way that got implemented to a point-and-click-level in SAP CCMS.

Technically your idea on using snapshots is likely to work, but as I've already written, you will have to make up the monitoring yourself.

It would be nice to see HP and SAP(MaxDB) take the snapshot technology one or two steps further, to provide a guaranteed consistent backup, and can be block level verified. I think HPs ZDB (zero downtime backup eg snapshots) technology for SAP on Oracle using Data Protector does this now?!??!

Sorry to tell - but that's far to late.

Assume you have a corruption in the database (sooner or later you will have one).

The primary way out of such a situation is to recover a backup in order to avoid dataloss.

Now assume you just went into such a corruption with one of the SAP transactions you don't use that often.

Which backup can you use and know for sure that the corruption is not in there?

Data corruption can mean so many things. If your talking structure corruption or block corruption, then you do hope that your consistency checks and database backup block checks will bring this to the attention of the DBA. Hopefully recovery of the DB from tape and rolling forward would resolve this.

However if your talking data corruption as is "crap data" has been loaded into the database, or a rogue ABAP has corrupted several million rows of data then this becomes a little more tricky. If the issue is identified immediately, restoring from backup is a fesible option for us.

If the issue happened over 48hrs ago, then restoring from a backup is not an option. We are a 24x7x365 manufacturing operation. Shipping goods all around the world. We produce and ship to much product in a 24hr window that can not be rekeyed (or so the business says) if the data is lost.

We would have to get tricky and do things such as restore a copy of the production database to another server, and extract the original "good" documents from the copy back into the original, or hopefully the rogue ABAP can correct whatever mistake they originally made to the data.

Look...there are hundreds of corruption scenarios we could talk about, but each issue will have to be evaluated, and the decision to restore or not would be decided based on the issue at hand.

A much better procedure is (also not included in the point-and-click actions of CCMS):

- perform a backup

- perform a recovery

- perform consistency check

I would love to think that this is something we could do daily to a sandpit system, but with a 1.7TB production database, our backups take 6hrs, a restore would take about 10hrs, and the consistency check ... well a while.

And what a luxury to be able to do this ... do you actually know of ANY sites that do this?

I've already written about this ... Not my fault - whose then?

Had a read ... being from New Zealand I could easily relate to the sheep 😃

Lucky you - no working hours on the weekend... BTW: do your large reporting batch jobs also take the weekend off?

Thats not wan't I meant. Like I said we are a 24x7x365 system. We get a maximum of 2hrs downtime for maintenance a month. Not that we need it these days as the systems practically run themselves. What I meant was that between 7am and 7pm are our busiest peak hours, but we have dispatch personnel, warehouse operations, shift supervisors ..etc.. as well as a huge amount of batch running through the "night" (and day). We try to maintain a good dialog response during the core hours, and then try to perform all the "other" stuff around these hours, including backups, opt stats, and business batch, large BI extractions ..etc..

Are we busy all day and night ... yes ... very.

Good idea - but you would check far to seldom.

Noted and agreed. Will do daily backups via MaxDB kernel, and a full verification each week.

I fully agree and I really like the opportunities you get by that.

But keep in mind that your data backups can only be used together with a non-interrupted chain of log backups. So although you can use the database after a recovery of a year old data backup your business might not be too happy to find the database in a state of a year ago...

We could only ever restore from the previous nights backup and roll forward in reality. We keep the monthly and yearly backups for audit purposes (legal) more than anything. Would we ever recover from them ... no. Redirected restore to a sand-pit system (assuming the stars and moons are aligned) if we happened to get audited possibly ... but thats about.

One last question. If we "restored" from an EVA snapshot, and had the DB logs upto the current point-in-time, can you tell MaxDB just to roll forward using these logs even though a restore wasn't initiated via MaxDB?

Cheers

Shaun

lbreddemann
Active Contributor
0 Kudos

Hi Shaun,

interesting thread sofar...

> It would be nice to see HP and SAP(MaxDB) take the snapshot technology one or two steps further, to provide a guaranteed consistent backup, and can be block level verified. I think HPs ZDB (zero downtime backup eg snapshots) technology for SAP on Oracle using Data Protector does this now?!??!

Hmm... I guess the keyword here is 'market'. If there is enough market potential visible, I tend to believe that both SAP and HP would happily try to deliver such tight integration.

I don't know how this ZDB stuff works with Oracle, but how could the HP software possibly know how a Oracle block should look like?

No, there are just these options to actually check for block consistency in Oracle: use RMAN, use DBV or use SQL to actually read your data (via EXP, EXPDB, ANALYZE, custom SQL)

Even worse, you might come across block corruptions that are not covered by these checks really.

> Data corruption can mean so many things. If your talking structure corruption or block corruption, then you do hope that your consistency checks and database backup block checks will bring this to the attention of the DBA. Hopefully recovery of the DB from tape and rolling forward would resolve this.

Yes, I was talking about data block corruption. Why? Because there is no reliable way to actually perform a semantic check of your data. None.

We (SAP) simply rely on that, whatever we write to the database by the Updater is consistent from application point of view.

Having handled far too much remote consulting messages concerning data rescue due to block corruptions I can say: getting all readable data from the corrupt database objects is really the easy part of it.

The problems begin to get big, once the application developers need to think of reports to check and repair consistency from application level.

> However if your talking data corruption as is "crap data" has been loaded into the database, or a rogue ABAP has corrupted several million rows of data then this becomes a little more tricky. If the issue is identified immediately, restoring from backup is a fesible option for us.

> If the issue happened over 48hrs ago, then restoring from a backup is not an option. We are a 24x7x365 manufacturing operation. Shipping goods all around the world. We produce and ship to much product in a 24hr window that can not be rekeyed (or so the business says) if the data is lost.

Well in that case you're doomed. Plain and simple. Don't put any effort into getting "tricky", just let never ever run any piece of code that had not passed the whole testfactory. That's really the only chance.

> We would have to get tricky and do things such as restore a copy of the production database to another server, and extract the original "good" documents from the copy back into the original, or hopefully the rogue ABAP can correct whatever mistake they originally made to the data.

That's not a recovery plan - that is praying for mercy.

I know quite a few customer systems that went to this "solution" and had inconsistencies in their system for a long long time afterwards.

> Look...there are hundreds of corruption scenarios we could talk about, but each issue will have to be evaluated, and the decision to restore or not would be decided based on the issue at hand.

I totally agree.

The only thing that must not happen is: open a callconference and talk about what a corruption is in the first place, why it happened, how it could happen at all ... I spend hours of precious lifetime in such non-sense call confs, only to see - there is no plan for this at customer side.

> I would love to think that this is something we could do daily to a sandpit system, but with a 1.7TB production database, our backups take 6hrs, a restore would take about 10hrs, and the consistency check ... well a while.

We have customers saving multi-TB databases in far less time - it is possible.

> And what a luxury to be able to do this ... do you actually know of ANY sites that do this?

Quick Backups? Yes, quite a few. Complete Backup, Restore, Consistency Check cycle? None.

So why is that? I believe it's because there is no single button for it.

It's not integrated into the CCMS and/or the database management software.

It might also be (hopefully) that I never hear of these customers. See as a DB Support Consultant I don't get in touch with "sucess stories". I see failures and bugs all day.

To me the correct behaviour would be to actually stop the database once the last verified backup is too old. Just like everybody is used to it, when he hits a LOGFULL /ARCHIVER STUCK situation.

Until then - I guess I will have a lot more data rescue to do...

> Had a read ... being from New Zealand I could easily relate to the sheep 😃

> Thats not wan't I meant. Like I said we are a 24x7x365 system. We get a maximum of 2hrs downtime for maintenance a month. Not that we need it these days as the systems practically run themselves. What I meant was that between 7am and 7pm are our busiest peak hours, but we have dispatch personnel, warehouse operations, shift supervisors ..etc.. as well as a huge amount of batch running through the "night" (and day). We try to maintain a good dialog response during the core hours, and then try to perform all the "other" stuff around these hours, including backups, opt stats, and business batch, large BI extractions ..etc..

> Are we busy all day and night ... yes ... very.

Ah ok - got it!

Especially in such situations I would not try to implement consistency checks on your prod. database.

Basically running a CHECK DATA there does not mean anything. Right after a table finished the check it can get corrupted although the check is still running on other tables. So you have no guranteed consistent state in a running database - never really.

On the other hand, what you really want to know is not: "Are there any corruptions in the database?" but "If there would be any corruptions in the database, could I get my data back?".

This later question can only be answered by checking the backups.

> Noted and agreed. Will do daily backups via MaxDB kernel, and a full verification each week.

One more customer on the bright side

> One last question. If we "restored" from an EVA snapshot, and had the DB logs upto the current point-in-time, can you tell MaxDB just to roll forward using these logs even though a restore wasn't initiated via MaxDB?

I don't see a reason why not - if you restore the data and logarea and bring the db to admin mode than it uses the last successfull savepoint for startup.

If you than use recover_start to supply more logs that should work.

But as always this is something that needs to be checked on your system.

That has been a really nice discussion - hope you don't get my comments as offending, they really aren't meant that way.

KR Lars