SAP HANA Host Auto-Failover is a fully automated fault recovery solution where basically one or more standby hosts are configured to work in standby mode and added to the existing HANA system. In this method, the standby host does not contain any data and nor accepts any application requests while in standby mode and no data is preloaded in the standby host (unlike System Replication). You can consider this as a cluster-like HA solution in the same data center.
When it comes to high availability in SAP HANA, we should always aim for RPO of zero data loss which especially in business-critical production environment and Host Auto-Failover is one of the two high availability options in SAP HANA that can provide you
ABSOLUTE ZERO DATA LOSS.
Figure 1: SAP HANA Host Auto-Failover in a minimal setup for HA
In Host Auto-Failover, SAP HANA regularly checks if all the cluster members are still active; and when an active host fails, a standby host “automatically” takes over its place. An internal cluster manager “nameserver” manages this entire failover process so we don’t need a 3
rd party cluster software, it is handled internally within SAP HANA. Also, standby host needs access to all the database volumes in this scenario, so there will be one data pool and this can only be achieved by a shared networked storage.
The failover process happens on the host level, so failure of a single service or process won’t trigger the failover. When the primary host fails, the standby host will take over its lock on the data pool and continue working from there, so there will be no data loss. Also, because this failover process is entirely managed internally as an automated process, we should be careful to keep the data consistency. The data may be corrupted if a failed host (previously active) is restarted manually for recovery and attempts to write to data pool in parallel with failover process. So, it would be better to ensure no manual intervention during auto-failover. A controlled failback can be performed by stopping or restarting the standby host that is currently in use.
Figure 2: SAP HANA Host Auto-Failover in a scale-out scenario
To ensure data consistency, SAP introduced two capabilities:
Heartbeat is a regular TCP communication to check if the primary host is active as master before attempting to take over master role or perform a failover. It can happen from nameserver to nameserver between hosts or nameserver to hdbdeamon with SAP HANA internal communication protocol.
I/O Fencing is the process of isolating a failed node and protecting shared data pool to ensure that the (failed) primary host no longer has access the data or log volumes. This can be achieved via SAP HANA storage connector APIs.
Host |
Indexserver (configured role) |
Indexserver (actual role) |
Nameserver (configured role) |
Nameserver (actual role) |
Initial host |
Worker |
Master |
Master 1 |
Master |
1st added |
Worker |
Slave |
Master 2 |
Slave |
2nd added |
Worker |
Slave |
Slave |
Slave |
3rd added |
Standby |
Standby |
Master 3 |
Slave |
Table 1: An example configuration for a Multiple-Host System in a scale-out scenario
Host Auto-Failover is a great HA option in scale-out scenarios and offer an easy option by having one or more hosts as standby as you can see above. If you want to add hosts to an existing SAP HANA system, you can use the SAP HANA database lifecycle manager (HDBLCM) or its web interface. Also, it is possible to monitor the status of all active and standby hosts in the SAP HANA cockpit and the SAP HANA studio (Landscape --> Hosts) tab.
Key benefits
- RPO of zero data loss
- Automated process managed internally by SAP HANA nameserver, no additional 3rd party cluster management software required
- Low RTO, failover execution time is similar to a SAP HANA startup
- Failover detection of primary host in less than a minute
- Networked storage *may* lower your HANA HW costs
Trade-offs
- Data is not preloaded, so a little higher recovery time compared to System Replication (but no longer than a SAP HANA startup)
- Failover detection of network related issues can be around 5-7 mins
Do you have any question about SAP HANA Host Auto-Failover? Leave a comment below, I would love to help you and learn from you as much as I can!
Feel free to share!
If you liked this post, you might like these relevant posts:
SAP HANA High Availability and Disaster Recovery Series #1
SAP HANA HA and DR Series #2: Redundancy and Fault Recovery Support
Choosing the right HANA Database Architecture
References and further reading:
SAP HANA Administration Guide
Note
Setting up Host Auto-Failover
SAP HANA - Host Auto-Failover
Monitoring Host Status and Auto-Failover Configuration