Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
2,757

Replication flow is one of the main artifact to ingest data into SAP Datasphere from Source systems (SAP or Non SAP).

There are two types of extraction types Initial Only / Initial and Delta, based on the 'Source Connection' type Delta option will be enabled in Replication Flow.

In the previous blog, I have explained about creation of replication flow with Initial only (refer: https://community.sap.com/t5/technology-blogs-by-members/sap-data-sphere-replication-flow/ba-p/13920...).

In this blog,  I will try to explain the creation of a Replication flow with Initial and Delta using HANA connection and with the use case to ingest Master Data (Info Object) from SAP BW system to SAP Datasphere with Replication Flow.

Use Case:

           Source: Master Data (ZCOUNTRY) from SAP BW system

           Target : Table in SAP Data Sphere

           Extraction: Replication Flow

           Connection Type: HANA

  Below are the steps involved to create Replication flow in SAP Data Sphere.

  1. Selecting the Replication flow
  2. Choose the connection
  3. Configuration with settings and deployment
  4. Monitoring the job and reconcile with source

--------------------------------------------------------------------------------------------------------------------------------------------------

Step 1. Selecting the Replication flow

Login to SAP Data Sphere -> Main menu -> Data Builder -> Replication Flow

    Lokesh_Kumar_Pothapola_0-1730734743724.png

Lokesh_Kumar_Pothapola_1-1730734743725.png

Step 2: Choose the connection

 Select source connection

Lokesh_Kumar_Pothapola_2-1730734743725.png

Note: Connection type with SAP HANA must be configured from SAP BW to SAP Data Sphere as a pre requisite.

  Select Source Container

Lokesh_Kumar_Pothapola_3-1730734743725.png

  Select Source Objects and import (master data table /BIC/PZCOUNTRY Attribute table from SAP BW system)

Lokesh_Kumar_Pothapola_4-1730734743726.png

Select and import

Lokesh_Kumar_Pothapola_5-1730734743726.png

Step 3:  Configuration with settings and deployment

Once the selected tables are imported, target table in Data sphere must be selected to store data in a local table.

Select Target Connection -> Select Local Repository configured for Data Sphere

Lokesh_Kumar_Pothapola_6-1730734743726.png

After selecting the target connection, the target tables will be available with one to one mapping from source.

Lokesh_Kumar_Pothapola_7-1730734743727.png

 

Projections

While loading data, filtering or mapping conditions can be altered

Filter: Filtering based on required values

Lokesh_Kumar_Pothapola_8-1730734743727.png

Mapping: Target column names, data type etc

Lokesh_Kumar_Pothapola_9-1730734743727.png

 

Load Type is the extraction method where the Delta job can be configured. Select the option 'Initial and Delta' (as the source is SAP BW master data and connection type is HANA the Delta option is available).

Note: While importing master data with ABAP connection, only Initial option is available, please refer to my previous blog:https://community.sap.com/t5/technology-blogs-by-members/sap-data-sphere-replication-flow/ba-p/13920...).

Lokesh_Kumar_Pothapola_10-1730734743728.png

After selecting Delta, below columns will be added automatically to target structure to track the changes.

Change_Type - Insert, Modified and Deletion status will be captured

Change_Date - Date of the delta record

Lokesh_Kumar_Pothapola_11-1730734743728.png

Delta Load Interval: Scheduling the job as per the frequency required to load data from source to target

Lokesh_Kumar_Pothapola_12-1730734743729.png

Save the Replication flow in the required folder and deploy the artifact. After deployment notification, run the job and monitor via Tools.

Lokesh_Kumar_Pothapola_13-1730734743729.png

Deployment notification

Lokesh_Kumar_Pothapola_14-1730734743729.png

 

Step 4: Monitoring the Replication flow

Since this is a Replication flow with Delta, the job executes in 2 steps

  • All the available records will be transferred, in the below case 6 records are loaded.
  • In next job, only the required records will be transferred (modified, newly added and deleted).

 Below is the monitoring screen where extraction details like runtime, number records, loading status, partitions details are available.

Initial Job

 At the time of extraction

Lokesh_Kumar_Pothapola_15-1730734743730.png

After the extraction

Lokesh_Kumar_Pothapola_16-1730734743731.png

Data at source table level (SAP BW master data)

Lokesh_Kumar_Pothapola_17-1730734743732.png

Delta job

As the Delta load interval is set run at hourly, the next job automatically starts after one hour.

Lokesh_Kumar_Pothapola_18-1730734743732.png

Delta Log:

Lokesh_Kumar_Pothapola_21-1730736173642.png

 

Below highlighted newly added 4 records are transferred with Delta job.

Lokesh_Kumar_Pothapola_20-1730734743734.png

In conclusion, Delta is one the main feature which is used in Replication flow to capture the modified data. However this option will be enabled based on the source connection type and source table.

In my next blog, i will explain about loading the transaction data (ADSO) from SAP BW to SAP Data Sphere.

 

Thanks

Lokesh Kumar Pothapola

 

 

 

10 Comments
albertosimeoni
Participant

The strange thing to me is this:
Imagine that I need to run delta during night time only, lets say everey 24 hours starting at 2.00AM.

I need to wake up at 2.00AM, open the PC, and manually run the "Init+ Delta" replication flow.

What I do not like about replication flow is this:

having all the objects inside 1 (,2,...10 replication flow as more than 10 can not run in parallel).

Leads you to make double backflips when you try to MAINTAIN the DWH.
Untill like yesterday, if I need to extract another field from "Sales Order Header", and the replica was inside the same replication flow that load Financial Document Items, I need to restart the flow and restart the delta of "Financial Document Items".

BenedictV
Active Contributor
0 Kudos

Hi @albertosimeoni  are you not able to schedule the Replication Flows for Init and Delta?

I completely agree on the restart issue. I do not know why every flow has to be restarted if one needs a change. On the other hand, why are you not creating 1:1 flows for each replication artefact? Why put up to 10 objects in one flow?

 

Thank You,

Benedict

@albertosimeoni You are correct, Replication flow with Initial and Delta is not allowing to create a schedule. I believe we should be able to add the job to task chain and schedule it. Need to explore further.

Lokesh_Kumar_Pothapola_0-1731418109594.png

@BenedictV : You are correct, we have implemented 1:1 replication flow for each object, because restarting would be a major challenge if we add many objects in single flow.

 

Thanks

Lokesh

 

albertosimeoni
Participant
0 Kudos

@BenedictV 

Good point,

From what I know by documentation only 2 replication flows can run in parallel.

The Init+Delta once started it is indeed in a "running" state even if it is waiting => my first conclusion was to put every object inside 1 or 2 replication flows. (if you allocate more CU you can get a maximum of 10 parallel jobs).

But testing it it seems that is not what I thought, as I try to create 3 RF in Init+Delta and there is not a limitation on the active state, probably the limitation is on the concurrency when they replicate the delta.

albertosimeoni_0-1731418435787.png

albertosimeoni_1-1731418547481.png

@kpsauer are these 3 concurrent delta+init RF legit or we may incurr in some sort of over-consumption? )

Other drawbacks that maybe we miss with only 1 replication flow per object?

Thread per object may not be optimized as you can have 1 or 2 replication flow run in parallel that are designed to do massive data extraction, so a single replication flow is designed to replicate more objects and you have some thread management options to allocate thread to single objects inside replication flow.

https://community.sap.com/t5/technology-blogs-by-sap/replication-flow-blog-series-part-7-performance...

albertosimeoni
Participant
0 Kudos

@BenedictV  Good point, From what I know by documentation only 2 replication flows can run in parallel. The Init+Delta once started it is indeed in a "Active (Retrying Objects)" state

=> my first conclusion was to put every object inside 1 or 2 replication flows.

(if you allocate more CU you can get a maximum of 10 parallel jobs).

But testing it now, it seems that is not what I thought, as I try to create 3 RF in Init+Delta and there is not a limitation on the active state,

probably the limitation is on the concurrency when they are "effectively replicating the delta".

@kpsauer are these 3 RF init+delta rinning legit or we may incurr in some sort of over-consumption?

albertosimeoni_2-1731420974319.png

albertosimeoni_1-1731420931238.png

Other drawbacks that maybe I miss is that with only 1 replication flow per object, Thread per object may not be optimized as you can have 1 or 2 replication flow run in parallel that are designed to do massive data extraction, so a single replication flow is designed to replicate more objects and you have some thread management options to allocate thread to single objects inside replication flow.

https://community.sap.com/t5/technology-blogs-by-sap/replication-flow-blog-series-part-7-performance...

DanielIngenhaag
Product and Topic Expert
Product and Topic Expert

Hi all,

couple of comments and remarks from my end regarding some of the topics that are discussed in the threads above.

  1. In the meanwhile certain changes can be done on a running replication flow without restarting the entire replication flow, e.g. adding or removing a data set (link). Additionally, we will soon support the same when changing the delta load interval without restarting the entire replication flow to offer more flexiblity for users.
    @BenedictV 

  2. The delta of a Replication Flow can currently not be scheduled, i.e. a Replication Flow  using load type initial+delta can currently not be incorporated in a task chain. This is in our backlog and in discussion when we can support that. At the moment you can only influence the delta via the "Delta Load Interval" parameter.
    @Lokesh_Kumar_Pothapola 

  3. For very large data sets it could be beneficial to have a dedicated replication flow in case you want to achieve a desireable performance, but in general we would not see only 1 data set per replication flow in each case.

  4. You can run many replication flows in parallel and not just only two. The more replication flow run, the more impact on performance will occur depending on the data volume & number of available jobs that need to serve the different replication flows. But there is no physical limit how many replication flows can run in parallel.
    @albertosimeoni 
     

Let me know if you have additional questions and hapyp to help out 🙂 

Kind regards,

Daniel

AnkurGoyal03
Discoverer
0 Kudos

Hi @DanielIngenhaag ,

We need to run the delta on adhoc basis for sometime, I don't find any option for that, is there any workaround for it or does it on roadmap for future enhancement?

This is critical during month end process, when business wants to have data refreshed frequently, irrespective of scheduled interval.

Can you please suggest?

Thanks,

Ankur Goyal

DanielIngenhaag
Product and Topic Expert
Product and Topic Expert

Hi @AnkurGoyal03 ,

we do not have such a push based approach available and depending on when the last delta was replicated, it can lead to impact on the source system if the logging tables are growing fast. But one option as a "workaround" could be the following feature where you will be able to change the delta load interval without restarting the entire replication flow (link). If the next scheduled delta is too far in the future, you can set a very low value to trigger the detla as soon as possible.
However, you mgith need to find a good strategy for your default interval, e.g. daily, depending on your overall requirements and change rate in the source system etc.

Daniel

AnkurGoyal03
Discoverer
0 Kudos

Thanks @DanielIngenhaag for letting us know about this, I understand and hoping this restarting entire flow without impacting delta will help. It's planned for Q4,2024, any tentative dates or week you might be able to share, so that we can plan accordingly, we have a go-live in next couple of weeks.

DJ112
Discoverer
0 Kudos

Hello @DanielIngenhaag

I've 2 questions about DELTA replications:

1. While replicating data, noticed that SUCCESS file is getting generated only for INITIAL replications, whereas it's not the case with DELTA replications. We need SUCCESS file after each successful DELTA replication to:

  • As confirmation that replication is completed successfully.
  • Use it as Trigger for further processing the Replicated file.

2. My 2nd query is about generating message for DELTA replication, in case there is any failure. As We can't include Delta into taskchain, is there any provision which can send Notification email in case Delta replication had failed.

Lastly, Expected timeframe for DELTA replications inclusion in Taskchain. Hopefully that will take care of lots of Adhoc and Dependency based Delta replication. Timing right an interval-based Delta replication is almost impossible for these scenarios.

Thanks,

Deepak Jain

Labels in this area