Technology Blog Posts by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
alperakbal
Discoverer
689

 

Downtimes: they're what keep us IT people awake at night—the absolute worst experience for any IT staff member. And what's the second worst thing? Trying to convince the business that we need a scheduled outage for maintenance. It's like asking permission to intentionally break something, so we can fix it better later.

As a SAP Customer Engineer at Google Cloud, I work closely with technologies that help ensure reliability for critical workloads. For organizations running mission-critical SAP workloads, maintaining uptime and ensuring business continuity is paramount.

Disruptions, both planned and unplanned, can lead to significant financial losses, damage customer trust, and create operational bottlenecks. SAP RISE streamlines the move to the cloud, but its effectiveness depends heavily on the cloud platform's reliability.  While SAP RISE provides a managed service with a default 99.7% System Availability SLA primarily addressing unplanned downtime, the underlying Google Cloud infrastructure includes features designed to further enhance real availability (planned + unplanned). For instance, Live Migration technology helps minimize planned maintenance downtime. Additionally, Google Cloud offers Memory Poisoning Recovery for its SAP HANA virtual machines, which, coupled with proactive monitoring capabilities, works to reduce the likelihood and impact of unplanned hardware-related outages.

Therefore, for SAP RISE customers, choosing a cloud platform with inherent high availability is a key business decision.

 

Live Migration: Planned Maintenance Without Business Interruption

Traditionally, routine hardware and hypervisor maintenance caused significant planned downtime in on-premises and other hyperscaler's infrastructure. Google Cloud's Live Migration changes this by allowing virtual machines, including those running critical SAP components like SAP S/4HANA and SAP HANA databases (up to 12TB), to be moved dynamically between servers without any perceptible interruption to SAP applications.

GCP Live Migration involves a phased transfer of a VM's state from a source to a target host within the same zone. Initially, memory is copied in a "brownout" phase while the VM continues running. A brief "blackout" pauses the VM for final state transfer, typically under a second, before resuming operation on the target host. This process preserves VM properties, including network configurations and storage attachments, without requiring a reboot, thus minimizing application disruption during hardware maintenance.
Screenshot 2025-04-07 12.11.50.png

Here's why Live Migration is a game-changer for SAP RISE customers, partners, and SAP employees:

  • Eliminate Planned Downtime for Infrastructure Maintenance: Google Cloud can perform essential hypervisor and hardware maintenance, including critical security updates and infrastructure upgrades, without requiring any shutdown or reboot of the SAP systems. This translates directly to significantly higher uptime and a substantial reduction in disruptions for business users. 

  • Ensure Continuous Uptime for Mission-Critical SAP Servers: Live Migration guarantees that SAP applications remain consistently accessible, even during unforeseen underlying infrastructure events. Thus latest advancements in infrastructure technologies can easily be introduced by Google.

  • Enhanced Single Instance SLA: Google Cloud provides an industry-leading uptime SLA of 99.95% for single-zone (99.99% when deployed in multizones), memory-optimized VM instances, which are commonly utilized for SAP HANA deployments. Although SAP RISE offers the same default SLA under the RISE umbrella for the whole stack, this shows the confidence Google has for its technology.

Unlike other cloud providers who might offer Live Migration with considerable limitations or require VM restarts for certain maintenance activities , Google Cloud delivers comprehensive Live Migration capabilities across its SAP-certified virtual machine portfolio . This is a key differentiator that directly translates to tangible higher uptime for SAP RISE customers.

Memory Poisoning Recovery: A Proactive Defense Against Unplanned Hardware Errors

While Live Migration addresses planned downtime; unplanned hardware failures, particularly those stemming from memory errors, remain a potential risk in any infrastructure. In 2009, Google Cloud published the first major study on memory reliability. We found an average error rate of over 8% per year in DIMM modules installed in production systems. Given that each generation of DDR RAM packs more capacity into smaller packages, it's possible that memory hardware reliability has been impacted..

Google Cloud tackles this challenge head-on with its unique and proactive Memory Poisoning Recovery technology . It is a feature designed to mitigate the impact of uncorrectable memory errors. It works by:   

  • Detecting and isolating memory regions affected by these errors, preventing them from corrupting data.   
  • Employing techniques to minimize VM disruption, often using Google Cloud’s unique live migration feature to move VMs away from failing hardware.   
  • This system provides enhanced resiliency, especially for memory-intensive applications like SAP HANA, by reducing downtime associated with hardware memory failures.

Screenshot 2025-03-24 20.52.53.png

Here's how Memory Poisoning Recovery fortifies SAP RISE deployments against unexpected disruptions:

  • Intelligent Detection of Uncorrectable Memory Errors: Memory Poisoning Recovery can detect uncorrectable memory errors on the underlying hardware infrastructure before they lead to system-wide failures.

  • Minimized Impact of Memory-Related Failures: If an uncorrectable memory error is detected, Google Cloud can perform a rapid restart of the affected VM while initiating a live migration of other VMs on the same server, preventing the propagation of failures across multiple SAP systems. This can also be coupled with SAP HANA Fast Restart to minimize impact on directly affected SAP HANA system. This feature is actively used by SAP Enterprise Cloud Services(ECS) for RISE deployments on Google Cloud

  • Significant Reduction in Downtime from Hardware Incidents: Memory Poisoning Recovery drastically reduces downtime from hardware failures, ensuring higher availability for business-critical SAP applications.

Google Cloud offers this proactive memory error recovery mechanism for all VM deployments free of charge, and it can be used for SAP HANA and Application virtual machines. This provides an unparalleled level of resilience for SAP RISE customers.

A Foundation of Unwavering Resilience for SAP RISE on Google Cloud

The strategic combination of Live Migration, which minimizes planned downtime to near zero , and Memory Poisoning Recovery, which provides a proactive defense against unplanned hardware errors, establishes a uniquely resilient infrastructure for SAP RISE deployments on Google Cloud.

By recommending and choosing SAP RISE on Google Cloud, SAP customers and partners can confidently embrace a cloud environment that’s engineered for continuous operations and minimal disruptions, like all other Google services we all enjoy every day! 

This allows organizations to decisively focus on driving innovation, achieving their strategic business objectives, and maximizing the value of their SAP investments with unwavering confidence in the platform's reliability.