Lambda Architecture Implementation Using SAP Datas...

pavan_kumar_yanman · ‎2025 Jan 06

1. Pre-requisites

SAP Datasphere
SAP HANA Cloud Data Lake Files
SAP HANA Cloud Database
Apache Spark
Apache Kafka (SAP BTP Compatible)
SAP Analytics Cloud (Optional)

2. What is Lambda Architecture for Analytics?

Lambda architecture is designed to handle analytics for continuously changing high volume data  https://en.wikipedia.org/wiki/Lambda_architecture . Lambda Architecture, when applied to ERP (Enterprise Resource Planning) data, enables the processing and analysis of large volumes of transactional and operational data, catering to both historical and real-time analytical needs. This consists of a Batch Layer, which processes data in a batch process that runs on preset time intervals. Simultaneously, the Speed Layer handles real-time data streams, providing instant insights into live sales figures, stock levels, and other critical metrics using technologies like Kafka. The Reporting Layer merges outputs from both the Batch and Speed layers, offering a unified view of ERP data for quick access, which is essential for dashboards, ad-hoc reporting, and querying by various business applications.

Lambda Architecture

3. Setup Overview

Here S/4 HANA System shall be used as a data source and Datasphere shall be used for replicating the data from S/4 HANA System into layers. Setup overview shall look like the following..

Lambda Architecture using SAP Datasphere and SAP HANA Cloud

3.1 Data Source(s)

Here CDS Views of SAP S/4 HANA System are used as data sources. These should be enabled for data extraction with valid delta ( CDC delta mechanism ) setup.

3.2 Speed Layer

The speed layer consists of 2 components:

Apache Kafka (Variant supported by SAP Datasphere has to be used here) for real-time replication of SAP S/4 HANA data from the chosen CDS Views
A BTP App that shall read the Kafka Topics and apply the needed transformations and finally write the data to HANA Cloud Database.

3.3 Batch Layer

The batch layer consists of 2 components:

SAP HANA Data Lake Files for storing all the data replicated from data source and also to implement medallion architecture for lakehouse ( Delta Lake).
Apache Spark for transforming the raw data into the needed form for reporting ( with or without medallion architecture and delta lake )

3.4 Reporting Layer

The reporting layer consists of 2 components:

SAP HANA Cloud Database for storing the data coming from speed layer as well as virtualizing / replicating the batch layer data.
SAP Analytics Cloud for facilitating reporting as well as analysis

3.5 SAP Datasphere

The main component of this architecture and orchestrator of the data. Replication Flows component and spaces concept of the SAP Datasphere are employed which replicate the data from CDS Views of SAP S/4 HANA System to the chosen target.

More information on Datasphere spaces can be found here  https://learning.sap.com/learning-journeys/explore-sap-datasphere/introducing-sap-datasphere-spaces or https://developers.sap.com/tutorials/data-warehouse-cloud-4-spaces..html .

Currently, within a given Datasphere space, a data source (CDS View of SAP S/4 HANA System) can only be replicated to one target. This shall change once the new feature on the roadmap is delivered  https://roadmaps.sap.com/board?range=CURRENT-LAST&PRODUCT=73555000100800002141#Q1%202025;INNO=BBD862... .

Space1: Speed Layer –
1. Create a space for speed layer
2. Create connection to SAP S/4 HANA System
3. Create connection to Kafka
Space2: Batch Layer –
1. Create space for Batch layer
2. Create connection to SAP S/4 HANA System (Datasphere connections are space specific)
3. Create connection to SAP HANA Data Lake Files

4. Implementation Steps

4.1 Speed Layer

4.1.1 Create Datasphere Space for Speed Layer

Datasphere Spaces for Batch and Speed Layers

4.1.2 SAP S/4 HANA Connection Setup

Connection to SAP S/4 HANA OnPremise system needs Cloud Connector setup. Detailed steps are available here 

Pass the connection test

4.1.3 Kafka Setup and Datasphere Connection

Follow the steps in the blog use SAP BTP Compliant Kafka and connect to SAP Datasphere  https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-replication-flows-blog-series-pa... .

4.1.4 Create the Replication flow for required CDS View ( I_COMPANYCODE is used here)

Go to Datasphere  Data Builder  New Replication Flow.
Select the Source as S/4 HANA Connection created in previous step and
Target as Kafka
Change Delta Load Interval to 0 Hours and 0 Minutes to make it real-time
Deploy the Replication Flow and
Run the replication flow and Check the metrics tab for more details upon Initial Run completion

6. Run the Replication Flow and check data in the target ( here Kafka Topic )

Check data ( Kafka Topic is created and data is replicated into the Kafka Topic)

Note:- Checking a topic details (UI or CLI ) will be different based on the chosen Kafka

4.1.5 Transform, Enrich and Write to Database

Data should be read from Kafka topic and transformed, enriched and finally the results are written to HANA Cloud Database in the corresponding “speed layer” table.

While this can be done in many ways, a simple BTP App is depicted here and detailed steps involved like data cleansing, transformation, enrichment (lookups etc.) depend on the scenario. COMPANYCODE example chosen here is a master data and not suitable for such steps and hence skipped.

4.2 Batch Layer

4.2.1 Create Datasphere Space for Speed Layer

4.2.2 SAP S/4 HANA Connection Setup

Repeat the steps of this connection from “Speed Layer” again. As mentioned earlier, this is needed until feature of multiple targets for same datasource is released  https://roadmaps.sap.com/board?range=CURRENT-LAST&PRODUCT=73555000100800002141#Q1%202025;INNO=BBD862... .

4.2.3 HANA Data Lake Files Connection

Provision the instance of HANA Cloud, Data Lake Files in HANA Cloud Cockpit (Relational Engine is not required)

Create a connection to HANA Cloud, Data Lake Files using the steps provided  https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/356e41e880e54255891b702d2a... or https://community.sap.com/t5/technology-blogs-by-members/exporting-tables-from-datasphere-to-hana-da... .

Pass the connection test..

4.2.4 Create the Replication Flow for required CDS View ( I_COMPANYCODE is used here)

Go to Datasphere  Data Builder  New Replication Flow.
Select the Source as S/4 HANA Connection created in previous step and
Target as HANA Data Lake Files
Change Delta Load Interval to 24 hours and 0 minutes

Deploy the Replication Flow and Run the same

Run the Replication Flow and Monitor in Datasphere’s “Data Integration Monitor”

Check metrics tab for more details:

Check the Data in Data Lake Files using Database Explorer

4.2.5 Transform, Enrich and Write Results

Data Lake based analytics using Apache Spark is a huge topic with many architectures. But, Medallion lakehouse Architecture (https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion ) is suitable for S/4 HANA data as it’s a transaction data with deltas. This can be implemented using delta.io (https://delta.io/blog/delta-lake-medallion-architecture/ ) libraries directly or using databricks (https://www.databricks.com/glossary/medallion-architecture ) or other frameworks and tools. This part is skipped here.

4.3 Reporting Layer

Reporting layer in this scenario consists of HANA Cloud Data Lake Files (containing batch layer results as a Gold layer tables) and HANA Cloud Database (containing speed layer results).

4.3.1 HANA Cloud Database:

Speed Layer results in Physical Table
Batch Layer results in Virtual Table pointing to Gold layer tables of HANA Data Lake Files
- An aggregated set of gold layer tables can also be stored in physical tables based on memory.

4.3.2 Data Models and Data Marts:

HANA Cloud Native Modeling can be used for developing the Calculation Views that unions the data from speed and batch layers and finally enrich the same with Calculation Views with star-join nodes.

Alternatively, Datasphere’s Views and Analytical Models in Data Builder can also be used.

4.3.3 Analysis using SAP Analytics Cloud

Analytical Models of Datasphere or Calculation Views of HANA Cloud developed above can be used to create Dashboards and Analysis to support the reporting.

5. Conclusion

Datasphere’s Connections, Replication Flows along with HANA Cloud Database and HANA Cloud Data Lake Files can be used to achieve the lambda architecture for high volume S/4 HANA Systems with real-time reporting needs.