This blog is part of a blog series from SAP Datasphere product management with the focus on the Replication Flow capabilities in SAP Datasphere:
Data Integration is an essential topic in a Business Data Fabric like SAP Datasphere. Replication Flow is the cornerstone to fuel SAP Datasphere with data, especially from SAP ABAP sources. There is also a big need to move enriched data from SAP Datasphere into external environments to succeed certain use cases. Any data that is moved out into external targets is called premium outbound integration and will get highlighted in the next blog.
The intention of Replication is toa simplify the realization of data replication use cases in SAP Datasphere. Replication Flow is the name of the artefact that a user creates & maintains inside the SAP Datasphere Data Builder application.
The main functionalities of Replication Flows cover:
See more details in the graphic below.
Overview
Looking at the supported source & target connectivity, different connectivity can currently be used when creating a Replication Flow, which can also be checked in our product documentation under the following Link.
Replication Flow source and target connectivity
The supported source connectivity includes:
The supported target connectivity includes:
There are partially special configurations available for specific target connections, such as different file formats for target objects stores (e.g. CSV, Parquet etc.). More information about these configuration settings can be found in our product documentation.
Note: Before starting to create your Replication Flow, you want to take a look in the following SAP Note: https://me.sap.com - Important considerations for SAP Datasphere Replication Flows.
It contains a list of major & important considerations as well as limitations of the Replication Flow in SAP Datasphere. Please have a look here if your scenario might be affected before going on and start building Replication Flows.
In case you have any feedback for future enhancements of Replication Flows, please use the SAP Influence portal.
This chapter will provide an overview for how to create a Replication Flow including an explanation for all relevant settings a user needs to define in the different steps.
In this example we will show you how to connect an SAP S/4HANA system and replicate an ABAP CDS view into SAP Datasphere.
First of all, we open the Data Builder application in SAP Datasphere
Homescreen
In the Data Builder, you can open the Flows tab where you can finally create a Replication Flow..
Select S/4HANA as a source
Select New Replication Flow
First, select a source connection using the SAP Datasphere user guidance Select Source Connection button.
Select Source Connection
Then in the pop-up dialog, select the connection to SAP S/4HANA by selecting the connection S4_HANA.
In a second step, you need to select a source container by clicking on Select Source Container:
Select Source Container
The definition of a container depends on the individual source system you have selected. The following examples can show what a container can be for the common source systems a Replication Flow supports:
In the pop-up dialog, select the folder CDS to leverage the replication of CDS Views from the selected SAP S/4HANA system.
Select CDS Views
In a next step, we will add the source data sets (= CDS Views) that will be replicated as part of this example. Therefore, click on the button Add Source Objects.
Add Source Objects
After browsing through the navigation bar, we select the following four custom ABAP CDS views out of the folder TMP.
Select custom CDS views
Afterward we click on Next and Add Selection to add the four CDS Views to your Replication Flow.
Add selection
After the selection is successful, you will see that the CDS Views are now available in your Replication Flow.
Overview screen
In case you want to remove replication objects from your Replication Flow, please mark the desired object and click on the remove button next to the source object name.
Remove object
Define your target connection as part of the data replication scenario. In this case we replicate the data from SAP S/4HANA to SAP Datasphere local tables as target system.
To select the target connection, please click on the following button in your Replication Flow.
Select Target System
In this example, we will replicate CDS Views from SAP S/4HANA to SAP Datasphere as target system. Therefore, please choose SAP Datasphere in the dialog.
Select SAP Datasphere
Note: The displayed dialog shows only the connections that are supported as target system in Replication Flows. The SAP Datasphere connection is automatically created in your SAP Datasphere system, and you do not need to create it in the "Connection" application in SAP Datasphere, where you create connection to remote systems such as the SAP S/4HANA source connection ins this example.
You will recognize that the Target Container is automatically being filled with the name of the space you are currently logged in. This is currently the expected behaviour as the Replication Flow will always load the data to your local space where the Replication Flow is being created, in case you select SAP Datasphere as target system. Writing into another space in SAP Datasphere is not yet supported.
Target Container
After selecting the target connection and target container, the target data set name for each replication object will automatically be filled with the same name as the source data set name. The Replication Flow can either use an already pre-created data set in the target (e.g. a pre-created target table) or can create the target data set in case it is not yet existing.
When selecting a replication object, you can click on the Additional Options button next to the target data set name. Here you have the following options to:
Additional Options
There are different configurations possible for your Replication Flow in the modelling user interface, which are described in more detail in one of the following paragraphs.
For the following target systems (e.g., target object stores and Google Big Query) you can define different configurations when clicking on the settings icon next to the selected target connection.
Target specific settings
Example: Target Settings HDL_FILES
The following section will explain which configurations options are available using Replication Flows in SAP Datasphere including general settings valid for an entire Replication Flow as well as specific configurations on replication object level inside a Replication Flow.
For each selected source data set (replication object in your Replication Flow) there are two ways to configure each replication object using.
Settings
Settings
Load Type: Select the load type for each Task where you can select Initial Only or Initial and Delta. Initial Only will load the data via a full load without any change data capture (CDC) or delta capabilities. Initial and Delta will perform the initial load of a data set followed by replicating all changes (inserts, updates, deletes) for this data set. Furthermore, the required technical artefacts on the source to initiate the delta processes are automatically being created.
Truncate: A check box that allows users to clean-up the target data set, e.g., in case a user want to re-initialize the data replication with a new initial load.
Load Type
Projections
In case no projections have been defined, the display will be empty and to add a projection, please follow the steps in the paragraph where we explain the configuration options in the side panel.
Projections
By default, all columns supported from the source data set are being replicated to the target data set using an auto mapping with the exact same column names in the source & target data set. You can use the mapping dialog to customize the standard mapping, e.g., if the column names differ from each other. Additionally, you can remove columns that are not needed and also create additional columns and either map new columns to existing column of fill it with constant values or pre-defined functions (e.g., CURRENT_TIME, CURRENT_DATE). More information about mapping capabilities can be found here: Replication Flow Mapping.
When browsing and selecting a pre-defined target data set, e.g., a table in SAP Datasphere, you cannot create additional columns as the target structure is defined by the existing table. In such a case you can either let the replication flow create a new target table or adjust the pre-created table with new structure.
Note: At the moment, a user can only provide one projection per replication object and not multiple ones. There might be cases where columns from the source data set are not visible in the dialog and automatically being removed. The reason for this can be for example that the column is using a data type, which is not yet supported by replication flows. You can check the following SAP Note for details: Important Considerations for Replication Flows in SAP Datasphere.
When selecting a replication object, the following configuration panel appears on the right in which you can perform various configurations for each individual replication object in your Replication Flow.
Settings
The available settings include:
Note: In this case the user can provide granular configurations for each individual replication object in case the settings on replication flow level are not sufficient.
After you have done all required configurations, you need to save the replication flow using the Save button in the top menu bar.
Save
The following pop-up will appear where you can specify the name of your Replication Flow.
Name of Replication Flow
Note: Replication Flows will have the same name for business and technical name. This cannot be changed.
As a next step, you need to deploy the replication flow using the Deploy button in the top menu bar:
During the deployment several checks will be performed in background to check if the replication flow does fulfil all pre-requisites and is ready to be executed.
The deployment process will also make sure that the necessary run-time artefacts are being generated before you can finally start a Replication Flow.
In case the deployment is executed successfully, click the Run button to start your Replication Flow.
Run Replication Flow
Monitoring Replication Flows is embedded inside the SAP Datasphere Data Integration Monitor application.
You can either click the Data Integration Monitor application located on the left-hand menu panel or inside the Data Builder application after you have run the replication flow using the Open in Data Integration Monitor icon on the top menu bar of your Replication Flow:
Directly open Data Integration Monitor application.
Monitoring Application
The monitoring of Replication Flows is included into two parts, which is the general Flow Monitoring that provides a high-level overview of the Replication Flows (incl. also Transformation Flows) general monitoring status as well as a detail monitoring screen of each Replication Flow with detailed information about each replication object.
Once the Data Integration Monitor is being opened via the left-hand menu panel, a user will be re-directed to the main page of the Data Integration Monitor. A user can now navigate to Flow Monitor and finally suer the filter option to view all Replication Flows in the local space.
Flow Monitoring
In case you wants to switch into the detailed monitoring view of a specific Replication Flow, you need to click on the details button within a specific row of the selected Replication Flow.
Detailed Monitoring
Inside the detailed monitoring screen, you can access various different information such as the source and target connection, load statistics and the status of each replication object inside the selected Replication Flow.
Monitoring replication objects
You can also check the Metrics tab for additional statistics such as the initial load duration as well as the number of transferred records during initial as well as delta load phase.
Monitoring Metrics
Additional information can be also found here: Flow Monitoring in SAP Datasphere
This was the first detailed hands-on block for Replication Flows to give a general understanding of the functionality. The intention was to share as much as possible that you have success with your Replication Flow project. We will add over the time more and more of these blogs to share latest knowledge or hot topics.
Thanks to daniel.ingenhaag hannes.keil , martin.boeckling and the rest of the SAP Datasphere product & development team who helped in the creation of this blog series.
The next blog explains the premium outbound integration which is a new enhancement for Replication Flows delivered with the last release mid of November. Stay tuned.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
29 | |
13 | |
12 | |
10 | |
9 | |
9 | |
9 | |
7 | |
7 | |
6 |