Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
21,757

Replication flow is one of the main artifact to ingest data into SAP Datasphere from Source systems (SAP or Non SAP).

There are two types of extraction types Initial Only / Initial and Delta, based on the 'Source Connection' type Delta option will be enabled in Replication Flow.

In the previous blog, I have explained about creation of replication flow with Initial only (refer: https://community.sap.com/t5/technology-blogs-by-members/sap-data-sphere-replication-flow/ba-p/13920...).

In this blog,  I will try to explain the creation of a Replication flow with Initial and Delta using HANA connection and with the use case to ingest Master Data (Info Object) from SAP BW system to SAP Datasphere with Replication Flow.

Use Case:

           Source: Master Data (ZCOUNTRY) from SAP BW system

           Target : Table in SAP Data Sphere

           Extraction: Replication Flow

           Connection Type: HANA

  Below are the steps involved to create Replication flow in SAP Data Sphere.

  1. Selecting the Replication flow
  2. Choose the connection
  3. Configuration with settings and deployment
  4. Monitoring the job and reconcile with source

--------------------------------------------------------------------------------------------------------------------------------------------------

Step 1. Selecting the Replication flow

Login to SAP Data Sphere -> Main menu -> Data Builder -> Replication Flow

    Lokesh_Kumar_Pothapola_0-1730734743724.png

Lokesh_Kumar_Pothapola_1-1730734743725.png

Step 2: Choose the connection

 Select source connection

Lokesh_Kumar_Pothapola_2-1730734743725.png

Note: Connection type with SAP HANA must be configured from SAP BW to SAP Data Sphere as a pre requisite.

  Select Source Container

Lokesh_Kumar_Pothapola_3-1730734743725.png

  Select Source Objects and import (master data table /BIC/PZCOUNTRY Attribute table from SAP BW system)

Lokesh_Kumar_Pothapola_4-1730734743726.png

Select and import

Lokesh_Kumar_Pothapola_5-1730734743726.png

Step 3:  Configuration with settings and deployment

Once the selected tables are imported, target table in Data sphere must be selected to store data in a local table.

Select Target Connection -> Select Local Repository configured for Data Sphere

Lokesh_Kumar_Pothapola_6-1730734743726.png

After selecting the target connection, the target tables will be available with one to one mapping from source.

Lokesh_Kumar_Pothapola_7-1730734743727.png

 

Projections

While loading data, filtering or mapping conditions can be altered

Filter: Filtering based on required values

Lokesh_Kumar_Pothapola_8-1730734743727.png

Mapping: Target column names, data type etc

Lokesh_Kumar_Pothapola_9-1730734743727.png

 

Load Type is the extraction method where the Delta job can be configured. Select the option 'Initial and Delta' (as the source is SAP BW master data and connection type is HANA the Delta option is available).

Note: While importing master data with ABAP connection, only Initial option is available, please refer to my previous blog:https://community.sap.com/t5/technology-blogs-by-members/sap-data-sphere-replication-flow/ba-p/13920...).

Lokesh_Kumar_Pothapola_10-1730734743728.png

After selecting Delta, below columns will be added automatically to target structure to track the changes.

Change_Type - Insert, Modified and Deletion status will be captured

Change_Date - Date of the delta record

Lokesh_Kumar_Pothapola_11-1730734743728.png

Delta Load Interval: Scheduling the job as per the frequency required to load data from source to target

Lokesh_Kumar_Pothapola_12-1730734743729.png

Save the Replication flow in the required folder and deploy the artifact. After deployment notification, run the job and monitor via Tools.

Lokesh_Kumar_Pothapola_13-1730734743729.png

Deployment notification

Lokesh_Kumar_Pothapola_14-1730734743729.png

 

Step 4: Monitoring the Replication flow

Since this is a Replication flow with Delta, the job executes in 2 steps

  • All the available records will be transferred, in the below case 6 records are loaded.
  • In next job, only the required records will be transferred (modified, newly added and deleted).

 Below is the monitoring screen where extraction details like runtime, number records, loading status, partitions details are available.

Initial Job

 At the time of extraction

Lokesh_Kumar_Pothapola_15-1730734743730.png

After the extraction

Lokesh_Kumar_Pothapola_16-1730734743731.png

Data at source table level (SAP BW master data)

Lokesh_Kumar_Pothapola_17-1730734743732.png

Delta job

As the Delta load interval is set run at hourly, the next job automatically starts after one hour.

Lokesh_Kumar_Pothapola_18-1730734743732.png

Delta Log:

Lokesh_Kumar_Pothapola_21-1730736173642.png

 

Below highlighted newly added 4 records are transferred with Delta job.

Lokesh_Kumar_Pothapola_20-1730734743734.png

In conclusion, Delta is one the main feature which is used in Replication flow to capture the modified data. However this option will be enabled based on the source connection type and source table.

In my next blog, i will explain about loading the transaction data (ADSO) from SAP BW to SAP Data Sphere.

 

Thanks

Lokesh Kumar Pothapola

 

 

 

18 Comments
albertosimeoni
Participant

The strange thing to me is this:
Imagine that I need to run delta during night time only, lets say everey 24 hours starting at 2.00AM.

I need to wake up at 2.00AM, open the PC, and manually run the "Init+ Delta" replication flow.

What I do not like about replication flow is this:

having all the objects inside 1 (,2,...10 replication flow as more than 10 can not run in parallel).

Leads you to make double backflips when you try to MAINTAIN the DWH.
Untill like yesterday, if I need to extract another field from "Sales Order Header", and the replica was inside the same replication flow that load Financial Document Items, I need to restart the flow and restart the delta of "Financial Document Items".

BenedictV
Active Contributor
0 Likes

Hi @albertosimeoni  are you not able to schedule the Replication Flows for Init and Delta?

I completely agree on the restart issue. I do not know why every flow has to be restarted if one needs a change. On the other hand, why are you not creating 1:1 flows for each replication artefact? Why put up to 10 objects in one flow?

 

Thank You,

Benedict

@albertosimeoni You are correct, Replication flow with Initial and Delta is not allowing to create a schedule. I believe we should be able to add the job to task chain and schedule it. Need to explore further.

Lokesh_Kumar_Pothapola_0-1731418109594.png

@BenedictV : You are correct, we have implemented 1:1 replication flow for each object, because restarting would be a major challenge if we add many objects in single flow.

 

Thanks

Lokesh

 

albertosimeoni
Participant
0 Likes

@BenedictV 

Good point,

From what I know by documentation only 2 replication flows can run in parallel.

The Init+Delta once started it is indeed in a "running" state even if it is waiting => my first conclusion was to put every object inside 1 or 2 replication flows. (if you allocate more CU you can get a maximum of 10 parallel jobs).

But testing it it seems that is not what I thought, as I try to create 3 RF in Init+Delta and there is not a limitation on the active state, probably the limitation is on the concurrency when they replicate the delta.

albertosimeoni_0-1731418435787.png

albertosimeoni_1-1731418547481.png

@kpsauer are these 3 concurrent delta+init RF legit or we may incurr in some sort of over-consumption? )

Other drawbacks that maybe we miss with only 1 replication flow per object?

Thread per object may not be optimized as you can have 1 or 2 replication flow run in parallel that are designed to do massive data extraction, so a single replication flow is designed to replicate more objects and you have some thread management options to allocate thread to single objects inside replication flow.

https://community.sap.com/t5/technology-blogs-by-sap/replication-flow-blog-series-part-7-performance...

albertosimeoni
Participant
0 Likes

@BenedictV  Good point, From what I know by documentation only 2 replication flows can run in parallel. The Init+Delta once started it is indeed in a "Active (Retrying Objects)" state

=> my first conclusion was to put every object inside 1 or 2 replication flows.

(if you allocate more CU you can get a maximum of 10 parallel jobs).

But testing it now, it seems that is not what I thought, as I try to create 3 RF in Init+Delta and there is not a limitation on the active state,

probably the limitation is on the concurrency when they are "effectively replicating the delta".

@kpsauer are these 3 RF init+delta rinning legit or we may incurr in some sort of over-consumption?

albertosimeoni_2-1731420974319.png

albertosimeoni_1-1731420931238.png

Other drawbacks that maybe I miss is that with only 1 replication flow per object, Thread per object may not be optimized as you can have 1 or 2 replication flow run in parallel that are designed to do massive data extraction, so a single replication flow is designed to replicate more objects and you have some thread management options to allocate thread to single objects inside replication flow.

https://community.sap.com/t5/technology-blogs-by-sap/replication-flow-blog-series-part-7-performance...

DanielIngenhaag
Product and Topic Expert
Product and Topic Expert

Hi all,

couple of comments and remarks from my end regarding some of the topics that are discussed in the threads above.

  1. In the meanwhile certain changes can be done on a running replication flow without restarting the entire replication flow, e.g. adding or removing a data set (link). Additionally, we will soon support the same when changing the delta load interval without restarting the entire replication flow to offer more flexiblity for users.
    @BenedictV 

  2. The delta of a Replication Flow can currently not be scheduled, i.e. a Replication Flow  using load type initial+delta can currently not be incorporated in a task chain. This is in our backlog and in discussion when we can support that. At the moment you can only influence the delta via the "Delta Load Interval" parameter.
    @Lokesh_Kumar_Pothapola 

  3. For very large data sets it could be beneficial to have a dedicated replication flow in case you want to achieve a desireable performance, but in general we would not see only 1 data set per replication flow in each case.

  4. You can run many replication flows in parallel and not just only two. The more replication flow run, the more impact on performance will occur depending on the data volume & number of available jobs that need to serve the different replication flows. But there is no physical limit how many replication flows can run in parallel.
    @albertosimeoni 
     

Let me know if you have additional questions and hapyp to help out 🙂 

Kind regards,

Daniel

AnkurGoyal03
Explorer
0 Likes

Hi @DanielIngenhaag ,

We need to run the delta on adhoc basis for sometime, I don't find any option for that, is there any workaround for it or does it on roadmap for future enhancement?

This is critical during month end process, when business wants to have data refreshed frequently, irrespective of scheduled interval.

Can you please suggest?

Thanks,

Ankur Goyal

DanielIngenhaag
Product and Topic Expert
Product and Topic Expert

Hi @AnkurGoyal03 ,

we do not have such a push based approach available and depending on when the last delta was replicated, it can lead to impact on the source system if the logging tables are growing fast. But one option as a "workaround" could be the following feature where you will be able to change the delta load interval without restarting the entire replication flow (link). If the next scheduled delta is too far in the future, you can set a very low value to trigger the detla as soon as possible.
However, you mgith need to find a good strategy for your default interval, e.g. daily, depending on your overall requirements and change rate in the source system etc.

Daniel

AnkurGoyal03
Explorer
0 Likes

Thanks @DanielIngenhaag for letting us know about this, I understand and hoping this restarting entire flow without impacting delta will help. It's planned for Q4,2024, any tentative dates or week you might be able to share, so that we can plan accordingly, we have a go-live in next couple of weeks.

DJ112
Discoverer
0 Likes

Hello @DanielIngenhaag

I've 2 questions about DELTA replications:

1. While replicating data, noticed that SUCCESS file is getting generated only for INITIAL replications, whereas it's not the case with DELTA replications. We need SUCCESS file after each successful DELTA replication to:

  • As confirmation that replication is completed successfully.
  • Use it as Trigger for further processing the Replicated file.

2. My 2nd query is about generating message for DELTA replication, in case there is any failure. As We can't include Delta into taskchain, is there any provision which can send Notification email in case Delta replication had failed.

Lastly, Expected timeframe for DELTA replications inclusion in Taskchain. Hopefully that will take care of lots of Adhoc and Dependency based Delta replication. Timing right an interval-based Delta replication is almost impossible for these scenarios.

Thanks,

Deepak Jain

ohseunghee
Discoverer

@Lokesh_Kumar_Pothapola 

@DanielIngenhaag

@albertosimeoni,

Hello, I have a question regarding data integration time.
Replication flows were created by assigning one adso to each of the three replication flows.
These three replication flows were set to a 24-hour delta load interval.
The load type is set to check initial, delta, and truncate.
I would like to inquire because too much data integration time is being used.

The delta load interval is 24 hours, but data integration was used for about 24 hours per day.
When the delta load interval was 4 hours, it was used for about 15 hours a day,
When the delta load interval was 12 hours, it was used for about 13 hours per day.

The data integration time was used too much, so it was changed to 24 hours, but after checking two days' worth of logs,
We confirmed that data integration time was being used for about 24 hours a day and paused it.

As a result of checking the log, delta execution and retries are continuing.
What is the difference between a delta run and a retry and why is a delta run not attempted 24 hours after completion?
I would like to inquire whether attempts are made periodically.

When to use data integration time
Is there anything other than replication flow?
And when I run the replication flow:
I am asking whether initial state, delta execution, retry, and stop time all use data integration time.

To summarize briefly:
When a replication flow runs, what are the states that consume data integration time?
I would like to know why the delta load interval is not reflected and why it is doing so many retries.
I would like to know how to check the exact log status.

KBasis
Explorer
0 Likes

.

 

chandru638240
Explorer
0 Likes

Hi @Lokesh_Kumar_Pothapola 
I have a small doubt Whether is there any separate view for capturing the delta table change date records as delta tables created while replication flow are not consumable.
I need the change date column to get the last few days delta records only

chandru6382_0-1738263132612.png

SAP Datasphere @DanielIngenhaag 

RAviraj3
Explorer
0 Likes

Hi Experts,

Suppose my Source is a View created in Datasphere by combining different CDC enabled CDS views from S4 On-Premise system, and Now I want to push the data from my modelled view to Azure Gen2. How can I achieve delta loads in this scenario? Going for Initial load everytime would cost more as it is a Premium Outbound Scenario.

 

Regards,

Ravi.

AnuragPakhidde
Explorer
0 Likes

HI @RAviraj3 ,
If you have following annotations on your CDS view that you created on S4 systems, the system will use automatic delta capture using CDC once you create replication flow in SAP Datasphere with "initial and delta" settings for the CDS view. 

@Analytics.dataExtraction.enabled:true

@Analytics.dataExtraction.delta.changeDataCapture.automatic: true

Please note that you need all SAP DI related TCI Notes implemented on your S4 system to support this. Also, if you have a very complex logic in your CDS views the CDC automatic delta will not work. Hope this helps!

AnuragPakhidde
Explorer
0 Likes

Hi @Lokesh_Kumar_Pothapola,

Is automatic CDC delta supported also for SAP BW systems, if we would like to extract certain HCPRs using SAP Datasphere's Replication Flow?
Idea is to create a CDS view using external HANA view of the HCPR and then extract it using replication flows in SAP Datasphere.

0 Likes

@AnuragPakhidde Apologies for the delay in response, i had issues with my account login.

CDC mechanism will work with SAP BW ADO's, pre requisite is to DMIS add on to be installed in BW system and ODP configuration is enabled.

ADSO can be consumed to Datasphere through Replication flow with 'Initial' and 'Initial and Delta' options, recently SAP launched 'Delta' option and it works almost like 'Initial with Delta'.

HCPR can be used in Replication flow, but Delta feature will not be available. As Delta feature will based on request id and change log table, both features not available in HCPR.

Note: Even any ADSO without change log table also cannot perform delta transfer to Datasphere.

Creating CDS view on HCPR with Delta annotations would be option which i still need to explore, it would be great if it works. Please let me know the outcome if you have implemented this solution.

Thanks

Lokesh

 

 

albertosimeoni
Participant
0 Likes

pay attention that the HANA connection creates triggers directly on the source table.

if you add a column the trigger goes into error and will lock the table for DML statements!!!

a better solution is to create a CDS view and expose it as ODP. 

in this way the delta is managed by the BW applicaiton server rather than the database and if you modify the source it does not invalidate CDC triggers.

so a CDS view that read directly the base tables of the IOBJ or ADSO, read thourgh ODP via RFC instead of ODBC directly to tables is more robust.