on 2011 Jan 14 7:28 PM
We currently have a SQL Anywhere 11 High Availability setup. It is synchronous. We have been looking to get a relatively live copy of the DB to an offsite location. It looks like our HA secondary DB might be the solution. So here's the setup we are thinking off:
Location 1 (main site) - Primary DB - Arbiter Location 2 (disaster recovery site) - Secondary DB
Locations 1 and 2 will have a VPN between them that will be used for communication between the two sites. Our main concern is latency: what will a synchronous VPN connected HA setup do to our transaction times? Due to legacy software, we do know that doubling transaction times will result in squaring locking issues. Do we need to go to a-synchronous HA? What will happen to our transaction times if the primary fails?
I realize we can all benchmark this (and we will) but to get real data that would require us to actually put our production machines in that config. I'd like to brainstorm it out before we commit to that.
What are your thoughts?
Request clarification before answering.
You might want to consider using the scale-out feature in version 12. This would allow you to keep your current configuration with both mirroring partner servers in one location and with them communicating synchronously. You would add a COPY server at the disaster recovery site. Log pages would be sent to the copy server asynchronously, with occasional synchronous requests to prevent flooding the server. In the unlikely event of a disaster requiring the copy server's instance of the database to be made writable, some manual (or scripted) effort would be required to start the database.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If you are using the synchronous mode then e.g. an Insert will need the additional round trip time to your backup site before the main site server will respond the commit. So if e.g. latency is 150 ms between the sites I would count with at least 300 ms exec time per statement.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
asynchronous is fine, if you can accept to loose lets say the last 5 seconds of data in a crash. So I wouldn't recommend it for a banking application. Anyway you have to keep in mind that a recovery from a failure with an asynchronous configuration is much more complicated. If a transaction was lost during the switch an automatic recovery is impossible and you will need manual intervention (probably involving downtime) to bring both DBs back into sync.
@Martin: I don't think so. With asynchronous mirroring, if the recommended "auto_failover" option is set, an automatic failover is possible, and there won't be a need for manual intervention. However, as you state, that will introduce the risk of losing the last n transactions. - Whether this is critical or not is a different question. (Though I would argue that in case of a disaster at the primary location, some lost transactions might be less critical than severe downtime...). - That being said, I haven't tested asynchronous HA myself.
@Volker during our tests we have identified this behaviour and the need to do a manual sync before the HA is available again after failure. Auto_failover just means that if the first server crashs, that the mirror is taking over without any manual intervention. But to recover the system and return to a HA state you will need to do the sync.
@Martin: Thanks for the clarification! So this works the way I'd expected. - My point is that an automatic failover is possible with async HA, i.e. you don't need to have fully experienced staff on the secondary site (as you would need to handle a complex manual failover). In case a disaster has happened, then of course you would want to restore the HA setup shortly after but I guess you would have enough time (say, a few hours or days) to get experienced support staff from the primary site to the secondary site to fulfill that task.
@Volker and Martin: Thanks for the very helpful discussion. This pretty much works as we were expecting. Since the secondary server would be off site and expensive (round-trip-time-wise) in querying we would default back to a non-auto failover situation. We want full control. Also, in our current setup we have not seen the auto failover work very well in situations where the primary stopped responding. In tests all works as expected, but in production failover has not worked well for us. (Just to be clear: this might be specific to our setup where some large tables do not have a Primary Key)
User | Count |
---|---|
54 | |
6 | |
6 | |
5 | |
5 | |
5 | |
3 | |
3 | |
3 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.