Enterprise Resource Planning Blogs by SAP
Get insights and updates about cloud ERP and RISE with SAP, SAP S/4HANA and SAP S/4HANA Cloud, and more enterprise management capabilities with SAP blog posts.
cancel
Showing results for 
Search instead for 
Did you mean: 
marc_koderer
Explorer
9,419

Implementation with distributed database at Mahindra & Mahindra Limited


Introduction


Running ERP at scale in the cloud with SAP S/4HANA


In the age of Cloud ERP, customers can focus on the business value they draw from their ERP systems. Customer IT, liberated from a large share of basis workload, becomes primarily an enabler of business innovation.

At the same time, many companies have built out their SAP S/4HANA and SAP ECC systems with adaptations of standard processes and a variety of customer applications as a means to differentiate themselves from their competitors.

SAP S/4HANA cloud, private edition is also the solution for even some of the most demanding, largest cloud ERP systems in the world. Supporting the database scale-out technology of SAP HANA, there are no limits in achievable scale compared to classical on-premises installations of SAP S/4HANA.

Google Cloud as key partner for SAP S/4HANA cloud infrastructure


Google Cloud provides a platform for SAP to build, deploy and manage SAP S/4HANA cloud, private edition for the smallest to the most demanding customers. Google Cloud’s scalable infrastructure and high bandwidth low latency network provides excellent performance and stability for running SAP applications and delivers high application performance for scale-up and scale-out SAP S/4HANA systems.

Background


Mahindra & Mahindra Limited


Mahindra & Mahindra Ltd. is an Indian multinational automotive manufacturing corporation headquartered in Mumbai. It was established in 1945. Part of the Mahindra Group, Mahindra & Mahindra Limited, is one of the largest vehicle manufacturers by production in India.

In 2019, Mahindra & Mahindra went live with SAP S/4HANA after 18 months of a project for conversion of the existing SAP Business Suite (ECC 6.0) system to SAP S/4HANA 1709. The SAP Application is supporting more than  25000+ users and 80+ different company codes, running a broad range of SAP S/4HANA core modules such as FI/CO, MM, SD, PP, Asset Management, and core processes such as record-to-report, order-to-cash and procure-to-pay.

Move to SAP S/4HANA cloud, private edition


In 2021, Mahindra & Mahindra decided to adopt SAP S/4HANA cloud, private edition, choosing Google Cloud as the infrastructure provider. Project planning and safeguarding was provided through SAP MaxAttention services from the planning stage to Go-Live. Mahindra & Mahindra were also supported by the SAP S/4HANA customer care program (https://influence.sap.com/sap/ino/#/campaign/71), whose project coaches helped to move ahead efficiently with regards to clarification of new SAP S/4HANA product features and quick issue resolutions. Customer care also helped Mahindra & Mahindra to quickly adapt  innovation and introduce  90+ standard- as well as 200 custom SAP Fiori apps.

On the technical side, Mahindra & Mahindra expected the cloud system to match the existing on-premises system in terms of performance and throughput.

As the project progressed, intensive load testing showed that the originally foreseen single-node database infrastructure did not provide sufficient CPU capacity to sustain estimated peak workload.

Therefore, in August of 2022, the option of implementing SAP HANA database scale-out was introduced into the project plan. Only three-and-a-half months later, the system went live successfully with the first ever SAP S/4HANA cloud system operating on a distributed HANA database. The productive collaboration of Mahindra & Mahindra, SAP teams and Google Cloud paved the path to this impressive achievement.

SAP S/4HANA cloud, private edition


SAP S/4HANA Cloud, private edition enables companies to safeguard their existing SAP ERP investment while benefiting from a new level of flexibility. They can tailor the software to meet their specific needs, retain the company-specific configurations and customizations of the existing SAP ERP system, and access the latest capabilities that give them a competitive advantage.

SAP S/4HANA operating on HANA scale-out


SAP HANA scale-out is a HANA deployment option that allows to stretch the HANA database across multiple physical database servers (typically referred to as “nodes”). It is based on a shared nothing architecture, wherein each node controls its own data volumes, and generally, a given data set is stored and managed by one particular node. For applications, the scale-out cluster is represented as one single ACID-compliant database via the SAP HANA interfaces. Efficient connectivity to the optimal database node is ensured by the feature of client-side statement routing in the database client.


Figure 1: SAP HANA scale-out


Scale-out is used to scale the available physical memory, as well as the available CPU resources in the SAP HANA system. Though adding some complexity, scale-out enables growth beyond single node limits:

  • Total database sizes are reachable that are beyond the capacity of the largest available single servers.

  • Application workload can be distributed across multiple database nodes – also enlarging the CPU capacity available to the application.

  • Scale-out allows for more flexibility: it enables to react to strong system growth by adding additional scale-out hosts (within boundaries set by the applications).


For the core applications of SAP S/4HANA, a concept of scale-out by component placement has been developed. The main goals of this concept are:

  • Minimize the impact of inter-node communication for the core transactional workload thus ensuring good response times.

  • Provide a table distribution that is stable in time, not requiring frequent moving of database objects between the nodes.

  • Allow for a reasonable distribution of data volume and workload across the database nodes.


The fundamental design choices

  • Tables are clustered by application criteria, so that tables that are used within the same application context belong to the same table group. SAP provides a set of table groups as a starting point for projects, the initial proposal is adapted to each customer, including z-coding and z-tables.

  • OLTP-queries joining tables from different groups are avoided in the application standard.

  • All tables of a given group are kept on the same database node. Multiple groups may share the same node.

  • Selected master data or similar tables may be shared between all database nodes by means of a table replication concept.



Figure 2: S/4HANA Scale-out


The concept also allows flexibility to adjust to the actual situation in a given customer system:

  • Customers can adapt the SAP-provided set of table groups according to their specific needs, based on analysis of their core workload:

    • Merging multiple groups to one larger group

    • Adding additional tables (SAP- or customer-tables) to existing groups

    • Defining new groups



  • Customers can choose to replicate additional tables as required.


Since its introduction in the year 2017, several large SAP S/4HANA installations have been deployed using this scale-out offering. Mahindra & Mahindra are the first customer to have adopted scale-out in SAP S/4HANA cloud, private edition.

Details of the project


Collaboration between Mahindra&Mahindra, SAP, and Google Cloud


Throughout the project phase, teams from Mahindra & Mahindra, Google Cloud, and SAP worked in a well-aligned setup to safeguard the project, convert findings into actions and steer towards successful go-live.

Mahindra & Mahindra's existing setup and processes for load and performance testing proved an invaluable asset. It enabled testing and fine-tuning scale-out aspects such as table distribution or table replication to minimize impact of cross-node queries on statement performance and query throughput. Similarly, it made it possible to find optimization potential in custom development objects with respect to the underlying data distribution.

With the load test system on production-identical hardware, the teams could also spot and address optimization potential in the technology stack, be it in database parameterization or physical infrastructure.

Mahindra & Mahindra's test team was also quick to adjust the load test scenario where needed to reflect more closely the actual workload in the planned production system. This flexibility ensured that the test cycles provided highly meaningful results for understanding and safeguarding productive system behavior.

Google Cloud's technical infrastructure and SAP experts provided analysis of infrastructure KPIs and identified opportunities for VM deployment and configuration optimization as part of a Google Cloud Safeguarding service. Google Cloud's PSO team was on standby during the go live to help resolve any issues if they would arise, but the go live went through without any incidents for Google Cloud.


Figure 3: Collaboration



Implementing SAP S/4HANA on scale-out with cluster protection


In order to enlarge the compute bandwidth there was the clear need to scale horizontally with HANA scale-out. Vertical scaling will require moving to Google Cloud Bare Metal HANA systems. Furthermore, adding an additional HANA node doubles the I/O bandwidth of the disks which was also a very important aspect as it improves performance for backups and other operational tasks.

Due to the high resilience requirements of the customer the scale-out setup needs to be distributed across Google Cloud Availability Zones. To protect the system in case of failure the system was protected via Pacemaker Cluster software.


Figure 4: Please note: this setup is only available in RISE with SAP S/4HANA Cloud, private edition, tailored option with expert analysis and approval



Running scale-out clusters on Google Cloud


SAP HANA scale-out has been supported on Google Cloud since 2018 and Google Cloud provides standard best practices for setting up an SAP HANA scale-out clusters supporting high availability across availability zones. The scale-out architecture consists of one master host, a number of worker hosts, and, optionally, one or more standby hosts. The hosts are interconnected through a network that supports sending data between hosts at rates of up to 100 Gbps on selected machine types using high-bandwidth networking with the lowest possible latency.

As the workload demand increases, especially when using OLAP, a multi-host, scale-out architecture can distribute the load across all hosts.

The following features help ensure the high availability of an SAP HANA scale-out system:

  • Compute Engine live migration

  • Compute Engine automatic instance restart

  • SAP HANA host auto-failover with up to three SAP HANA standby hosts


For more information about high availability options on Google Cloud, see the SAP HANA high-availability planning guide.

Increasing disk IO using hyperdisk Extreme


Large SAP HANA system can require very high peak I/O throughput specifically to support HANA e.g., delta merges, savepoints and backups. Google Cloud provides linear scalable persistent disks with up to 1200MB/s throughput, 40000 read and 40000 write IOPS per VM used in the deployment of SAP HANA at Mahindra & Mahindra.

Mahindra & Mahindra’s SAP system was observed at peak throughput during stress testing to still be within these limits, so it was not a bottleneck for the go live. Google Cloud has recently released a new disk type, Hyperdisk Extreme, a high performing and scalable disk solution, providing up to 5000MB/s throughput and up to 350,000 IOPS. Hyperdisk Extreme is certified for SAP HANA workloads and is available for RISE with SAP Private Cloud Edition customer deployments on request.

Outcome


Successful go-live on scale-out after 3.5 months project time.

Stable system operations, outperforming requirements with respect to performance and throughput.

In total 12% faster response times after migration to RISE Google Cloud environment.


Figure 5: Response time comparison













SAP Notes




 
19 Comments
Prashant_Asawa
Explorer
Good detailed article...We should also release learning from this Journey..
marc_koderer
Explorer
0 Kudos
Yes, we are planning to release more material on the matter soon.
Prashant_Asawa
Explorer
0 Kudos
Thanks 👍
staerk
Participant
Happy to be part of the team 🙂 All I know about I/O benchmarking, I learned in the SAP LinuxLab. I summarized it here: https://www.admin-magazine.com/Archive/2016/32/Fundamentals-of-I-O-benchmarking

 

Hope it helps a lot of people 🙂
jgleichmann
Active Contributor
Hi Marc,

I really love to hear and read such success stories with architecture solutions, but for a good comparison story you have to tell all the details (you already covered some tech stuff of the target system):

  • which CPU type (number of cores) was used on prem? (hardware of 2017 with a going live in 2019 with S/4HANA 1709)

  • which VM on GCP was used (m2-ultramem-416?)

  • was there also a DB revision change (for sure)

  • SAP Kernel change? (for sure)

  • nothing about the update: S/4HANA 1709 compared to S/4HANA 2021? (1709 EoM was 31.12.2021 or extended support?)


You can only compare apples and apples. Currently it sounds like you doubled the hardware and gained 12% performance. May be there was also some other activity like the mentioned tuning opimization regarding SQL tuning which gained the performance improvement and not (only) the hardware. I think to solve everything with hardware sounds quite expensive. So comparisons to a baseline and optimized system iterations would describe the path to success and even more so the hours of hard work throughout the journey.

I would love to hear more about the hurdles and the tricky stuff and not only what was smooth. Because S/4 HANA scale out needs a lot of good know how regarding partitioning and grouping => table placement. How much time was spend to build the architecture and afterwards to optimize the system to each the targets?

However, great to hear that GCP solutions performing great with the S/4HANA load.

 

Regards,

Jens
avikjuoss
Participant
Good read! However, please clarify following statement/queries pertaining to the blog-

The fundamental design choices

  • OLTP-queries joining tables from different groups are avoided in the application standard. - What doe it mean for standard SAP application codebase? How is it feasible to avoid joining tables usage?


1/ Does the scale -out option is offered along with RISE with SAP licenses? When is it adopted in the project plan, during sizing or after Volume testing phase?

2/ What happened to their On premise S/4HANA system, is it sunset completely or running in side-car approach?

3/ 12% Faster response time is very less ROI for Business standpoint with such huge investment for IT driven cause as ON-prem would be supported at least till 2040- as per SAP's roadmap and they wanted to mirror the on-prem system to Private Cloud system so we can safely assume they already had stable & robust existing system even with peak workload! what was main driver to undergo this IT transformation with 200 custom fiori apps?

 

KR

Avik

 
jgleichmann
Active Contributor
Hi Avik,

 

referring to the grouping question/design choice:

You still can join tables which are managed/distributed on different nodes, but a join will generate inter-node traffic which is an overhead in the context of performance. So, your custom code still can be used, but it will harm the performance of the join. Avoid join them in one SQL. This means a read of selective data into internal tables will avoid the inter-node traffic, but will also kick out the code pushdown, because it is processed in the ABAP layer. It depends on the business case and the amount of data which solution makes more sense. May be the usage of temp. tables in a stored procedure can be used to come across this issue.

 

Regards,

Jens
avikjuoss
Participant
0 Kudos
Thanks for the response! I understand there is no 'one size fit all' approach in those scenarios and we may need to deal application wise in terms of data volume they would be dealing with in discussion with business owners.

Now as you mentioned the important essence of code push-down technique- that might prove counterproductive in case of underlying node structure/grouping/clustering- we have to navigate the hurdles with SQL trace monitoring- what about CDS views in VDM context/ODATA &Fiori?

There should be well documented repositories for such node structures underlying DB table groupings for application design guideline. All these for improved efficiency/less latency where long story short -essentially it seems, fancy name for 'increasing Disc size' - adding more parallel HDB resources along with already existing primary HDB multi cores- is it at all less expensive? Rather doing extensive performance optimization using available resources/capacity could have been a game changer.
gokhankrsgl
Discoverer

Hi Marc,

Thank you for sharing your experience.

Can you give some details about your environment ( size of database, number of transactions happening at peak hours, interface/batch loads, etc) ?

Then we are able to compare our systems more accurately  according to those values.

marc_koderer
Explorer
0 Kudos
Hello Jens,

thanks for your comment.
Currently it sounds like you doubled the hardware and gained 12% performance. 

No, the two scale-out nodes are pretty much comparable on CPU cores / SAPS values with the one scale-up system on-prem. And yes, optimizations like SQL statement analysis and other tunings have been part of the overall process.

Unfortunately, we are not able to share more details on the production system (like used instance types or revisions).
marc_koderer
Explorer
0 Kudos

Hi Gokhan,

thanks for your comment.

I understand your request but unfortunately we are not able to share more details on the production system of the customer.

marc_koderer
Explorer
0 Kudos
Hi Avik,

thanks for your comment.
1/ Does the scale -out option is offered along with RISE with SAP licenses? When is it adopted in the project plan, during sizing or after Volume testing phase?

Scale-out scenarios can be offered levering RISE private edition, tailored option (expert analysis required). Sizing and growth is a essential part of the conversation during the pre-sales phase and also during the delivery.

 

Regards

Marc
richard_bremer
Advisor
Advisor
Hello Avik,

regarding your question about OLTP-queries and table joins:

When operating S/4HANA on HANA scale-out, it's not our goal to avoid ALL joins between tables on different nodes, but to minimise such joins and avoid such joins in expensive statements / in statements that are part of performance-critical workload.

The approach that we are following to achieve this is twofold:

  • From application side, we provide a starting point for table grouping. Here, we group tables belonging to one application area (e.g. finance; or SD). By far the majority of SQL queries in an S/4HANA system only includes joins within a group, but not across group boundaries - at least in the SAP-delivered software.

  • Analysis of individual customer workload to a) further fine-tune table grouping and b) determine groups that should be kept together on one HANA node because there is significant cross-talk between the groups. Where a customer is already live on S/4HANA (as was the case with Mahindra&Mahindra), it is rather simple to obtain the necessary execution statistics. But we have also made good experience performing this analysis based on customer load-test data, partially extrapolating from ECC-system data (if conversion/migration from ECC to S/4HANA on HANA scale-out is planned) and further information.
    This second item is critical and requires thorough work and deep expertise.


Next to the grouping, we can use synchronous table replication for selected tables (master data and configuration data tables), to avoid distributed join queries. We are cautious not to replicated all such tables, but only those that will be beneficial to replicated in the given customer workload.

Obviously, there will be scenarios that requires cross-group and in any distribution setup also cross-node queries. The relative overhead coming from distributed execution is often acceptable in analytical queries / OLAP workload. And in OLTP queries, it depends on the context: if a query is executed as part of an end-user interaction and there's a few milliseconds added, this may be measurable but not noticeable. If we add a few milliseconds to a query that's executed ten million times in a batch job, this would likely be significant. Customer requirements and expectations therefore are important when it comes to our optimisation goals.

And regarding the scale-out option: in SAP S/4HANA, scale-out is supported (with certain additional technical restrictions such as specific selection of hardware / IaaS systems certified for OLTP scale-out). In RISE, it is offered as part of the tailored option.

As for Question 2: this was a system migration, so the private-cloud scale-out system replaced the original on-premise S/4HANA system.

And Question 3: the original on-premise infrastructure and the scale-out infrastructure cannot be compared 1:1 in the sense that one of the scale-out nodes would have the same CPU capacity as the original single-node server. See Marc's comment on Jens' question.

Best regards,
Richard (SAP S/4HANA product management)
kalyan
Participant
0 Kudos
Thank you for this. Can you tell me / point me to standard guidance which talks about migrating large single MDC system(15+ TB) to Scale out please. Either on-prem to on-prem or on-prem to Hyperscaler ?   If there is no standard guidance, can you highlight how this was achieved and if there are any tools that take care of table placement/grouping and partitioning.

Also you mention vertical scaling on GCP involves Bare Metal HANA systems. Is there something to read between the lines regarding vertical scaling. Can you clarify
jgleichmann
Active Contributor
"No, the two scale-out nodes are pretty much comparable on CPU cores / SAPS values with the one scale-up system on-prem."

"As the project progressed, intensive load testing showed that the originally foreseen single-node database infrastructure did not provide sufficient CPU capacity to sustain estimated peak workload."

This means a over 5 year old hardware has the same SAPS value as the new system with more requirements regarding system growth and scaling? Sounds kind of odd to me, especially given the need for better performance. But may be the tuning was such good that you need less hardware 😉

I understand that you can not share too detailed information, but you can make a statement of what you exactly compared in "Figure 5: Response time comparison". I mean it is a result but it is not really comparable.

As already mentioned:

  • different HANA revision

  • different SAP Kernel

  • different S/4 release

  • different custom code


In the end nobody knows what exactly achieved the 12% performance. The newer HANA revision with other optimizer decisions, the SAP kernel with optimized FDA access, the S/4 release with different code or the optimized custom code.

What I want to say, please be careful with the message: change architecture and optimize the code => you will achieve >10% performance (without really comparable KPIs and costs)

It should be more like be up-to-date with your system components, continuously optimize your coding, know your workload and how to scale it (may be also without scale-out). A lot of customers will compare the systems and their situations, but without the right KPIs and costs this is impossible.

 

Regards,

Jens
avikjuoss
Participant
0 Kudos

Thank you Richard for the detailed response.  One Follow-up question- when we create bespoke Tables- let's say 1st in P2P & 2nd in  R2R scope- in the application  layer and a 3rd CDS view for O2C area-customer specific in store operation, there might be 2 underlying groups in HDB layer.

How, all custom tables are put into certain group and how given groups are kept on the same database node ,does it hold true for CDS context, what instruction to be passed form application layer or inform NetWeaver team to enable this grouping sanctity behind the curtain?

KR

Avik

richard_bremer
Advisor
Advisor
Hello Avik,

table grouping is not exposed in the ABAP layer, and grouping alone would not even be useful here - because the thing that counts is the actual table/group distribution across the physical database nodes. Before you ask: also this physical distribution is not exposed in the ABAP layer.

Table grouping and table distribution needs to be done entirely within the database with the means provided by the database.

In consequence, it is something that in a customer project needs to be managed as specific guidance for development teams. This may sound worse than it is, because we are not talking thousands of tables that are relevant: in a typical S/4HANA scale-out system, there are tens of tables, at maximum few hundred tables that are not on the coordinator (master) node of the database.

Developers of ABAP code, CDS views or other artefacts (AMDPs) should be aware of the fact that a small number of tables is on the second DB node, that a further set of tables is replicated to all nodes; Only in code involving these tables do they need to be cautious. If table grouping has been defined in a good way, it will often even be sufficient to know that, for example, it's the finance area that's on the second node (plus a few less critical things used for better resource balancing, e.g. ZARIX tables or the application log or similar). So developers would know that as long as they are developing within finance, they will be good; or as long as they are developing entirely outside of finance; but that they need to pay attention when combining finance plus other application components in one query (one CDS view, one AMDP).

When it comes to defining new customer tables, by default (if the system is set up as recommended), they would be placed on the coordinator/master node - but can be added to table groups and moved to other nodes as required. Also, if reasonable, customer tables can be replicated (synchronously) to all nodes.

Now two things are important: a) not all cross-node queries are evil, but some are; and b) test, test, test -> if production is on scale-out infrastructure, also one pre-production system should exist in scale-out with identical table distribution, identical or comparable data volume and reasonable setup for testing including workload.

Best regards,
Richard
avikjuoss
Participant
0 Kudos
Awesome! Thank you for your erudite explanation.
pandeyis
Discoverer
0 Kudos
Great blog and thanks for sharing the knowledge from the project.

I was looking for the size of the HANA DB supporting this scale out configuration to understand how it is the "largest"? Was it larger than 72TB scale out? that seems to have been the largest scale out S/4HANA system I have learnt to be running in production in the cloud (not GCP) so far.