Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
DanielIngenhaag
Product and Topic Expert
Product and Topic Expert
3,650

This blog is part of a blog series from SAP Datasphere product management with the focus on the Replication Flow capabilities in SAP Datasphere: 

Replication Flow Blog Series Part 1 – Overview | SAP Blogs 

Replication Flow Blog Series Part 2 – Premium Outbound Integration | SAP Blogs 

Replication Flows Blog Series Part 3 – Integration with Kafka 

Replication Flows Blog Series Part 4 – Sizing   

Replication Flows Blog Series Part 5 – Integration between SAP Datasphere and Databricks   

Replication Flow Blog Series Part 6 – Confluent as Replication Target

Data Integration is an essential topic for an enabler of a Business Data Fabric like SAP Datasphere. Replication Flow is the cornerstone to fuel SAP Datasphere with data, especially from SAP ABAP-based sources. There is also a big need to move enriched data from SAP Datasphere into external environments to succeed certain use cases.

In this part of our Replication Flow blog series, we dive deeper into the performance topic when using Replication Flows including the aspects which factors are influencing the performance. Furthermore, I will also explain the available configurations for a user to influence the performance throughput of a Replication Flow including some specific aspects when using SAP ABAP-based source systems.

Important note: Before reading this blog, you need to understand the different terminologies that we described in part 4 of the blog series. If you have not yet read it, please do it before continuing with this blog!
Additionally, please also take a look on known limitations & important considerations for your scenario, e.g.  e.g. here for Replication Flow in general and here for SAP ABAP-based source systems.

Topics that will be covered in this blog: 

  1. Overview of Replication Flow performance
  2. Influencing factors & dependencies of Replication Flow performance
  3. Hints & Tips
  4. Conclusion 

Overview of Replication Flow performance

In the table below, we provide a high-level overview via “quick sizer” based on the performance assumptions we provided in Part 4 of our blog series to illustrate the max. achievable throughput for three different configurations using the number of max. parallel jobs (= Replication Flow Jobs):

DanielIngenhaag_1-1730890338973.png

* based on the average throughput is 520 million cells / hour per initial load replication thread. In the first example using 1 RF job: 520 mio cells / h * 10 replication threads = 5.200 mio cells / h

** based on the average throughput of 260 mio cells / hour per delta replication thread. In the first example using 1 RF job: 260 mio cells / h * 10 replication threads = 2.600 mio cells / h

All numbers above are based on the scenario to replicate data via CDS Views from an SAP S/4HANA based source system into SAP Datasphere as target system and only illustrate sample values. Depending on your individual system landscape and system specific parameters, real numbers might deviate when you replicate the data in your own environment!
For the initial load phase, we assume an average throughput of 520 million cells / hour per initial load replication thread. In the first example above this would sum up in = 520 million cells / hour per replication thread * 10 replication threads = 5.200 mio cells / hour during the initial load phase by using 2 parallel jobs for a Replication Flow (= default configuration).

During the delta load phase, we assume an average throughput of 260 mio cells / hour per delta replication thread. In the first example above this would sum up in: 2 RF jobs: 260 mio cells / h * 10 replication threads = 2.600 mio cells / h during the delta load phase by using 2 parallel jobs for a Replication Flow (= default configuration).

The following table illustrates some additional average performance throughput figures depending on the source and target system combination, where the number shown is the throughput in million cells per replication thread. This benchmark for performance throughput  per replication thread was measured when using a total assignment of the default 10 threads for this Replication Flow:

 

Target Systems

SAP HANA Cloud & SAP Datasphere

Google BigQuery

Object Store
(e.g. here GCS)

Source Systems

 

 

 

SAP HANA Cloud
(Tables)

840 million cells /h

680 million cells /h

720 million cells /h

SAP S/4HANA (CDS Views)

670 million cells /h

670 million cells /h

680 million cells /h

SAP S/4HANA
(tables via SLT)

730 million cells /h

No figures available

No figures available

 

Important Note: Especially when we talk about integration from SAP ABAPbased source systems, please make sure that your system is up-to-datevia installing the latest version of the as well as the implementation of all known SAP notes (incl. TCI notes) for SAP S/4HANA systems (on-premise and private cloud edition) so that can make use of the latest features including performance improvements. Please check our central SAP Note as well as the Note Analyzer for details if your system is up-to-date. We recommend to regularly check for new SAP Notes in a certain frequency, e.g. every 2-3 weeks.

Important to know is that the performance of a defined Replication Flow by a user in SAP Datasphere heavily depends on the number of assigned Replication Threads and the overall availability of Replication Flow Jobs in your SAP Datasphere tenant, but also other factors that we will describe in more detail in the next section.

Influencing factors & dependencies of Replication Flow performance

As already briefly mentioned in the paragraph above, there are several aspects that can influence the performance throughput of data replication in general. We tried to group the different influencing factors into different categories. Please note that the list is not complete, but it highlights some of the important factors that are typically influencing the performance throughput of a data replication scenario.

Category 1:  Infrastructure specific factors such as:

  • Deployment (e.g. public cloud, private cloud as well as on-premise deployment of systems etc.)

  • Location of all involved systems (source system, target system including components such as SAP Cloud Connector)
    For example: Are they all hosted in the same data center or hosted in different regions across the globe

  • Network bandwidth etc.

Category 2: System specific factors for source and target system(s) such as:

  • Available hardware resources, overall system utilization & configurations of the connected source system incl. source system specific considerations. For exmaple: available number of dialog processes for SAP ABAP-based source systems and concurrent processes running in the source system for daily business operations, additional workload based on other data extraction that is running against the source system in parallel etc.

  • Available hardware resources, system utilization & configurations of the connected target system incl. target system specific considerations, e.g. available memory & CPU resources on target HANA database to insert data and concurrent processes running in the target system for daily business operations etc.

Category 3: User specific configurations for influencing the data replication such as:

  • Number of assigned data integration hours and max. parallel jobs in your SAP Datasphere tenant configuration

    Note: The number of assigned max. parallel jobs are shared between all replication flows in the SAP Datasphere tenant.
  • Number of Replication Threads assigned to a Replication Flow

  • Total number of Replication Flows incl. number of assigned Replication Objects

  • Number of parallel running Replication Flows in your SAP Datasphere tenant, especially during the initial load phase

  • Configured delta load interval for Replication Flows
  • User configurations in the connected SAP ABAP source system, e.g. the number of jobs in the SLT configuration, number of data transfer jobs in SAP S/4HANA, defined ODP package size & parallelization for ODP etc.

In the following sections we will further focus on category 3. This includes the configurations that are available to a user to influence the performance. Furthermore, there are some detailed information when using SAP ABAP-based source systems.

Before we now dive deeper into the user-specific configurations for influencing the performance of a Replication Flow, we will quickly describe important parameters and terminologies including the default settings for all involved components.

Important Note: Please note that the table below includes the latest default values, which is especially important for SAP ABAP-based source systems (incl. SAP S/4HANA, SAP Business Suite / SAP ECC or SAP Business Warehouse systems). There might be different default values depending on the individual release state of your SAP ABAP-based system. Thus, please make sure your SAP ABAP-based system is up-to-date and contains all latest SAP Notes (link)!

 

 

Default Value

Location of Setting

Replication Flows & Replication Objects

There is no default value of how many Replication objects are created inside a Replication Flow. A user needs to add a minimum of 1 Replication Object to a Replication Flow with an overall maximum of 500 Replication Objects that can be added to a single Replication Flow.

SAP Datasphere

Replication Flow Jobs

Per default, each SAP Datasphere tenant has a maximum of 2 Replication Flow Jobs assigned, which can be increased via the tenant configuration by assigning more data integration blocks (link).

SAP Datasphere

Replication Threads (Initial Load) also known as “Source Thread Limit for Initial Load”

Per default, 10 Replication Threads are assigned per Replication Flow for the initial load phase, which can be changed by a user (). Please note that the replication threads need to be configured in the source as well as target connection settings and that the technical limit is always (technically) limited by the number of max. Replication Flow jobs in the tenant

SAP Datasphere

Data Transfer Threads (Delta Load)  also known as “Object Thread Count for Delta Loads”

Per default, 1 Replication Thread is assigned per Replication Object for the delta load phase. For SAP S/4HANA on-premise using CDS view-based replication and table-based replication from SAP source systems by leveraging SLT, users can increase the number of partitions to be used during the delta load phase using the parameter “Object Thread Count for Delta Loads” in the replication flow modelling UI (link).
Overall maximum value a user can assign is 10 and can also not be higher as the assigned number of threads for initial load that a user can configure.

SAP Datasphere

Partitions
(SAP ABAP source systems)

For SLT-table & CDS View-based data replications, the number of partitions is automatically being calculated for you depending on the size of the data set you want to replicate
If necessary, users can manually change the number of partitions via a parameter in the SAP ABAP system (link). However, the system still calculates the partitions automatically, but then limits it to the partitions defined by the user with the parameter mentioned in the SAP Note

For ODP-based data replication, a default value of 3 partitions is being used that a user can overwrite in the SAP ABAP source system to achieve a higher parallelization by configuring a parameter in the SAP ABAP backend (link). The partitioning for ODP is introduced via the following Note (link).

SAP ABAP

Partitions
(Database source systems like HANA or Azure SQL)


For table-based replication of supported database source systems (like SAP HANA database, or Azure SQL source systems), an automated partitioning is being performed depending on the size of the data set you are replicating in your scenario. There is currently no way for a user to change this handling in these source systems.

Azure SQL / HANA source

Package Size

(SAP ABAP source system using ODP)

 

Per default a package size of 100 MB is being used when transferring data from ODP source data sets, which can be changed by user via a parameter (link) assuming you have implemented the following SAP Note (link).

Note: Without this SAP Note being implemented, increasing the value manually, because it can lead to out of memory errors for your Replication Flow Jobs.

SAP ABAP

Maximum sessions allowed for SAP ABAP based source systems

If your system is up-to-date, the session check is being disabled (recommended approach) for Replication flows (link). In the past, in case the session limit has not been disabled, the default value was a maximum of 10 sessions.

However, the parameter can be used if you want to have an additional “security belt” to restrict the number of consumed sessions triggered by Replication Flows that are being executed towards your connected SAP ABAP system.

Note: This setting is also relevant when using Data Flows in SAP Datasphere to integrate data from SAP ABAP-based systems and sessions are also being used whenever you browse in SAP ABAP connections in SAP Datasphere, e.g. when you browse to select your data sets in a Replication Flow.

SAP ABAP

 

Now it’s  time to tackle the question “How can a user influence and optimize the throughput of a Replication Flow within SAP Datasphere?”

This question can be answered by four different actions that we will further outline below:

  • Configuration of Partitions
  • Assignment of Replication Threads for Initial Load
  • Assignment of Object Thread Count for Delta Load
  • Number of Replication Flows incl. number of assigned Replication objects
  • Considerations when tuning performance of SAP ABAPbased sources

Configuration of Partitions

You may wonder how partitions are being determined for my Replication Flow that I see in the Data Builder?

Replication Flows show the number of partitions in the Data Integration Monitor in SAP Datasphere. Partitions are being used by Replication Flows to divide the amount of data to be replicated in smaller chunks, which can then be replicated in parallel depending on the number of assigned replication threads for initial load. The number of partitions for the initial load is automatically determined by the respective Replication Flow source connection. During the delta load, the number of (delta) partitions is per default set to 1 for all Replication Flow source connections and can be changed for specific sources (will be explained later).

As mentioned earlier, in case you are replicating data from database-based source systems, the partition handling is automated based on the size of the data set and can currently not be influenced by a user.

For SAP ABAP-based sources, the partitions for CDS View as well as table-based extraction via SLT are also calculated automatically based on the number of available access plans that are available in the SAP ABAP-based source system. The number of partitions can be changed & limited by a user to use a custom partitioning via RMS_MAX_PARTITIONS parameter. In case of using ODP data sources from SAP ABAP based systems, you currently have a default partitioning of 3 partitions per data set, which can be changed by a user via a parameter in the connected SAP ABAP-based source system (see further details in the following paragraphs). 

Assignment of Replication Threads for initial load via thread limit

The topic around replication threads is also linked to the partitions that we explained in the previous chapter. Ideally Replication Flows can transfer every partition (initial or delta) concurrently, but however, that can overwhelm the connected source and target systems if there are thousands of partitions. For example, you have a Replication Flow with 100 Replication Objects each having 3 initial load partitions. Then you would have 300 partitions for this Replication Flow that can transfer data.

Therefore, the idea is to provide a configuration property for the end-user to control the concurrency or in other words the degree of parallelization to replicate the data via the “source Replication thread limit” and “target Replication thread limit”. Hence the underlying service responsible for replicating the data will limit the threads to source & target systems to the value defined by the user. These source & target threads are used not only for data transfers (initial and delta), but also for housekeeping activities (such as setup, initial partitioning, delta partitioning, teardown, etc.).

When creating your Replication Flow, you can set the number of replication threads (aka thread limit for initial load) to control the degree of parallelized data replication during the initial load phase for all Replication Objects that you have added to your Replication Flow. For those of you who already worked with Replication Flows in SAP Data Intelligence , this parameter is the same parameter as “Source Maximum Connections” and “Target Maximum Connections” (

Note: If you want to change the number of Replication Threads for existing Replication Flows, e.g. in order to fine tune the performance of the initial load time, you can do this also for running Replication Flows without restarting the load process.

Important: The available Replication Flow jobs that are assigned via Tenant Configuration are shared by all Replication Flows running across your entire SAP Datasphere tenant!

To increase or decrease the number of assigned replication threads, you need to go into the Data Builder application in SAP Datasphere and open your desired Replication Flow. The number of Replication Threads includes threads for:

  • Transferring data (= Data Transfer Threads*), per default each job has 5 data transfer threads
  • Housekeeping (= Housekeeping Threads), per default each job has 5 housekeeping threads

Here, you need to open the settings for both the source as well as the target connection to open the configuration dialog:

DanielIngenhaag_0-1730896020034.png

 

* Note: For transferring data, each replication flow job has different types of data transfer threads depending on the type of source system. That means for example each Replication Flow Job has:

  • 5 data transfer threads for database sources (e.g. using HANA, Azure SQL)
  • 5 data transfer threads for ABAP and object stores sources (e.g. S/4HANA & AWS S3)

This is then especially getting important for the performance throughput if you use two different kinds of source systems in your replication flows that can share the same replication flow job, which is finally also depending on the overall number of available replication flow jobs. 

For the source system configuration, the dialog should look like this where we see the default assignment of 10 Replication Threads as limit:

 

DanielIngenhaag_1-1730896208638.png

For the target system configuration, the dialog should look like this where we see the default assignment of 10 Replication Threads as limit:

DanielIngenhaag_2-1730896208645.png

Ideally, both settings should have the same value assigned to allow the optimal throughput. Entering different values for source and target Replication Threads would very likely lead to a bottleneck for the end-2-end performance of your Replication Flow. Additionally, you might also consider increasing or decreasing the number of Replication Threads in steps of “5”, e.g. 10, 15, 20 etc., to get an equal number of Replication Flow Jobs assigned to your Replication Flow. The reason for this is that you typically have 5 data transfer threads per Replication Flow Job and therefore you can get the number of Replication Flow jobs by dividing the Replication threads by 5.

Note: As previously mentioned, the overall number of threads available in your SAP Datasphere is tenant is derived from the assigned number of “maximum parallel jobs” in your tenant configuration (link). For details, please check out the information in part 4 of this blog series (link).
Furthermore, you need to make sure that also your connected source and target system provide sufficient resources to replicate the data in the configured parallelization, e.g. number of dialog processes in your SAP ABAP source system.

Assignment of Object Thread Count for Delta Loads:

This new Replication Flow configuration property allows you to define how many of the replication threads can be used for parallel delta transfers for a single Replication Object in case you replicate data from CDS Views out of SAP S/4HANA systems or tables from SAP ABAP-based systems (leveraging SLT). The minimum number is 1, the maximum number is 10. In case you have assigned less than 10 threads for the initial load, the maximum value of the parameter is limited to the number of threads defined for initial load.

DanielIngenhaag_3-1730896384047.png

If you want to adjust the number of partitions for delta load, e.g. in case you are expecting a high change data volume, you can use a dedicated parameter in the source system settings of your Replication flow for the cases where you either want to replicate CDS views or tables leveraging SLT. We will dive a little deeper into this topic in the following paragraphs when we go into some CDS & SLT specific considerations.

Number of Replication Flows incl. number of assigned Replication objects

As mentioned earlier in this blog, you can add up to 500 Replication Objects in a single Replication Flow, which all share the number of assigned replication threads during the initial as well as delta load phase. But in reality, you probably will not assign such a large number inside a single replication flow and you might assign data sets to a replication flow based on certain conditions (e.g. volume of each data set, desired frequency for delta load etc.) which we will further describe at a later point in time.

If you want to achieve a certain performance for very large objects where you also want to ensure that a certain number of replication threads should exclusively being used for these Replication objects, you can divide your overall number of data sets that need to be replicated among several Replication Flows to increase the throughput by having more Replication Threads to replicate the data OR by assigning more Replication Threads to your existing Replication Flow to achieve a higher throughput.

The overall number of threads (derived from the assigned number of Replication Flow Jobs in the tenant configuration) set on the SAP Datasphere tenant level can be distributed to different replication flows based on your needs for throughput on the individual replication objects. For example, group a few objects with a high data volume for initial load into a replication flow having a high number of threads for a better initial load performance. Once the load is done, the number of threads can be reduced. Another example is grouping all tables with only little delta and assigning only a small number of threads to save resources.

All in all, both options are depending on the assigned maximum number of Replication Flow Jobs via the max. parallel jobs in your SAP Datasphere tenant configuration. The following table explains two basic approaches including its consequences. In another chapter at the end of the blog we also go little bit more in detail to check for possible performance improvements when replicating data via CDS Views or tables leveraging SLT.

 

Approach

Consequences

Increase the number of replication flows.
For example, don't replicate all tables as tasks within one single replication flow, but distribute the replication objects in multiple replication flows.

Increasing the number of replication flows will consume additional back-ground processes in the source/ target system per replication flow.

Requires more Replication Flow jobs for each replication flow limited by the number of overall available Replication Flow jobs in the SAP Datasphere tenant incl. their existing utilization from other Replication Flows.

Scale them carefully to not overload source / target system with additional processes and impacting existing replication flows, especially during the initial load phase.

Increase the number of threads within a replication flow to get more threads and, equivalently, more Replication Flow Jobs

With the start of multiple connections to the source/ target system, background processes get blocked for the handling of the data packages.

Scale the number of threads carefully to not block too many resources in the system, especially during the initial load phase of your replication.


Considerations when tuning performance of SAP ABAP
-based sources

When looking into the integration of different SAP systems, you have different interfaces & data sets that can be replicated depending on which system we are talking about, like an SAP S/4HANA system, an SAP Business Suite system (e.g. SAP ECC) or an SAP Business Warehouse system (e.g. SAP BW/4HANA). First, let us take a look on some general aspects that are relevant for SAP ABAP-based source systems and then go into each of the individual aspects of the three different SAP ABAP integration scenarios (CDS, SLT and ODP). Some additional information about performance when using Replication Flows with SAP ABAP sources can also be found here: https://me.sap.com/notes/3360905/E.

In this SAP Note, the concept of the data buffer is being explained for cases where you want to extract data via CDS Views or table-based replication leveraging SLT (for ODP sources this concept is not applicable!).

In summary, the source system is pushing the data into the data buffer where Replication Flows are picking up the data from the data buffer. Main goal is in this context to achieve a resilient data transfer by using a commit mechanism that indicates if a certain data package has been fully written to the target system by a replication flow or, in case of error scenarios, a data package needs to be sent again to the target system in case it has not been fully transferred. The following picture illustrates the base concept of the data buffer:

DanielIngenhaag_0-1730964221320.png

 

As the data buffer is a key component between transferring the data from the source to the target system, it also is an important factor when looking on the end-2-end performance of a replication flow in case you replicate data from an SAP ABAP source system.

Sample factors that influence the performance throughput (as explained in the SAP Note):

  • Transfer jobs in SAP ABAP writing data into the data buffer
  • Replication threads retrieving data from the data buffer
  • The size of buffer the table and its size limit
    (This has recently be optimized via a SAP Note to provide a better automated handling of the buffer size without manual user intervention to increase the buffer size)

Note: The data buffer has a size limit to make sure the buffer tables do not consume too much storage / space and once the limit of a buffer is reached, there will be no more data pushed into the buffer until there is new space inside the buffer available.

The parameter “APE_MAX_SESSIONS” is an important aspect that needs to be considered for all three integration scenarios (CDS, SLT and ODP). This parameter was initially introduced when using Data Flows in SAP Datasphere and Pipelines in SAP Data Intelligence. It acts like a “safety belt” through which you can control how many parallel sessions can be used by a integration tool (like a Data Flow for example) to replicate data depending on your available number of background and dialog processes in the SAP ABAP system.

In case your system is up to date with the latest SAP Notes, this parameter is disabled per default for Replication Flows only (= recommended approach) with one of the latest SAP Notes (link), but this parameter was used in the past with a default value of 10 for Data Flows as well as Replication Flows in SAP Datasphere and therefore we wanted to mention the old behaviour in this context.
You can find more information in the following SAP Note. In case you still want to use this parameter for Replication Flows, you can roughly calculate the number of sessions by  = 2,5 * the number of partitions you are using within your replication flow. Nowadays we typically recommend disabling this parameter as the calculation of expected sessions is not that easy and also has an impact on the performance throughput if the value is not configured correctly.

You can find more information in the following SAP Note. In case you still want to use this parameter for Replication Flows, you can roughly calculate the number of sessions by  = 2,5 * the number of partitions you are using within your replication flow. Nowadays we typically recommend disabling this parameter as the calculation of expected sessions is not that easy and also has an impact on the performance throughput if the value is not configured correctly.

DanielIngenhaag_5-1730896775332.png

Specific considerations for CDS View-based replication from SAP S/4HANA systems

There are different considerations in the context of replicating data via CDS Views from SAP S/4HANA based systems, which partially also differ depending on the deployment type of your system (SAP S/4HANA public cloud vs. SAP S/4HANA on-premise + SAP S/4HANA private cloud).

Access Plan Calculation

Is the phase in which the replication from CDS Views is being prepared. The access plan calculation in this context is created automatically by the SAP system using an SAP owned logic. There is currently no way for a user to influence the access plan calculation in both SAP S/4HANA on-premise as well as SAP S/4HANA Public Cloud systems.

Data Transfer Jobs

The data transfer jobs are responsible for loading the data into the data buffer in SAP S/4HANA where the data packages are being picked up by the Replication Flow. By default, minimum and maximum number of data transfer job is set to 1, which means 1 data transfer job is running per default. If this is not sufficient, a user can change the number of data transfer jobs using transaction SM30 and maintenance view “DHCDC_JOBSTG”  (General Job Settings of the CDC Engine) or by directly using transaction “DHCDCSTG” using the following instructions under this link for SAP S/4HANA on-premise based systems:

DanielIngenhaag_6-1730897406629.png

Partitioning

In the latest versions of SAP S/4HANA on-premise as well as in SAP S/4HANA public cloud, there is a logic to automatically calculate the number of partitions per CDS view depending on the size of each individual CDS View. Partitioning describes the mechanism to slice one data set into different smaller pieces, which can then be replicated to the target system with a certain degree of parallelization. The degree of parallelization for the data transfer is then mainly defined by the number of assigned “Replication Threads” in your Replication Flow settings in SAP Datasphere.

In case you do not want to use default automated partitioning (where using the automated partitioning is the recommended way) for CDS Views and rather define your own partition logic, you can do that via adjusting the parameter “RMS_MAX_PARTITIONS” (link). In case you enter the value for this parameter, the automated partitioning is not used and overwritten by your defined partition number (which is the maximum number of partitions being used). Please note that this value of “RMS_MAX_PARTITIONS” is valid for all CDS Views in your Replication Flow and is being applied during the deployment process of your Replication Flow.

Note: If you want to use different partition values for different CDS Views, you need to change the value multiple times.

Parallelization during delta load for CDS Views

In case the default of 1 partition = 1 replication thread per CDS View for the delta load phase is not sufficient to handle the delta volume in your scenario, you can configure more than 1 parallel delta partition for the delta load phase.

This setting can be changed in two ways:

  • By using the parameter “Object Thread Count for Delta Loads” in the SAP Datasphere Replication Flow UI (recommended way)

  • By using parameter in the SAP S/4HANA backend system

For way 1) please open the Replication Flow modelling UI in the Data Builder application in SAP Datasphere. Here, open the source system settings once you have defined your source & target connection as well as the required data sets.

DanielIngenhaag_1-1730964449188.png

Use the parameter “Object Thread Count for Delta Loads” to increase the number of partitions that should be used during the delta load phase (default 1: partition). If you change the number of delta partitions here, it will be applied for all replication objects in your Replication Flow (that use load type “initial+delta”, but you can also set the value to an individual value per replication object as well (see second screenshot below).
With the second parameter “Overwrite Source Settings at Object level”, you can overwrite the settings that you defined per object with this global value that you defined in the source settings configuration.

DanielIngenhaag_2-1730964470051.png

You can access the replication object level configuration by selecting a replication object and then check the property panel on the right-hand side:

DanielIngenhaag_3-1730964470061.png

As mentioned earlier, you can also follow approach 2) by adjusting the parameter in the SAP S/4HANA backend system. To do so, open the maintenance view “DHCDC_RMSOBJSTG” using transaction SM30 (please check also the following link). Per default, the value  1 is used, which means 1 delta partition per Replication object. You can increase the parameter in case you want to use more than maximum 1 delta partition. In the maintenance view you can individually maintain on a single CDS View level, for which CDS Views you want to use a delta partition that is greater than 1. For each CDS View where no entry is specified in the table, it will automatically use the default value of 1.

Additionally, there is always the rule that configuration in the UI is the leading value and overwrites potential configurations in the source system!

In the following example we have increased the number of delta partitions for:

  • I_BUSINESSPARTNER to 2 and
  • I_GLACCOUNTLINEITEMRAWDATA to 3

DanielIngenhaag_4-1730964504166.png

At the moment you cannot dynamically change this parameter for a running Replication Flow, which is a topic for a future enhancement of the functionality to allow dynamic and flexible changes during run-time without resetting your replication.

Performance optimizations based on optimized data conversion

Recently with a new SAP Datasphere release + implementation of SAP Note in your SAP ABAP-based system (link), an improvement of the data conversion has been implemented, which can also influence the performance of your CDS View replication (especially for larger tables). Please note that the usage of this feature is depending on your system, i.e. the SAP S/4HANA version + the provided SAP note link above as well as a prerequisite in the SAP kernel to allow fast serialization (link). If your system supports it, it will automatically be used and if it is not supported, it will switch back to the old data conversion without using the fast serialization that provides better performance.

Below we are trying to illustrate the most important parameters including the relationship & dependency between the parameters in SAP S/4HANA and SAP Datasphere:

DanielIngenhaag_5-1730964540406.png

 

Note: The Axino component is a technical service running in Datasphere who is manging all requests in case you use any SAP ABAP-based connections in your Replication Flow or Data Flow. Whereas in SAP Data Intelligence Cloud administrators could influence the sizing & resources of this service, in SAP Datasphere the scaling of Axino is handled automatically by SAP.

Specific considerations for table-based replication from SAP Business Suite systems leveraging SLT:

There are different considerations in the context of replicating data via tables using SLT from SAP Business Suite (e.g. SAP ECC) based systems.

Access Plan Calculation

Is the phase in which the replication from the tables is being prepared and is happening before the actual data replication is starting. Per default, there is one access plan calculation running per source table, but there are options to parallelize the access plan calculation (SAP NoteDocumentation). This becomes especially important for large data sets, where the “single threaded” access plan calculation without parallelization can take some time, e.g. 1 day.

Number of SLT jobs

When creating your SLT configuration, you need to specify a certain number of jobs for your data replication scenario including:

  • Data Transfer Jobs:
    You can specify the number of jobs that are used for the data transfer process in the SLT system

 

  • Number of Initial Load Jobs:
    Here you are able to configure the number of jobs that are used for the initial load in the SLT system

 

  • Number of Calculation Jobs:
    These are the number of jobs that are being used to calculate the data packages as a preparation step before the actual data replication starts, which are being used during the initial load phase. Depending on the used reading type, the technical function differs a bit.

In order to set an appropriate number of jobs for your scenario, please check out the existing and detailed SLT documentations:

Partitioning

In the latest versions of SAP Business Suite systems (e.g. SAP ECC) there is a logic to automatically calculate the number of partitions per table depending on the size of each individual table. Partitioning describes the mechanism to slice one data set into different smaller pieces, which can then be replicated to the target system with a certain degree of parallelization. The degree of parallelization for the data transfer is then mainly defined by number of assigned “Replication Threads” in your Replication Flow settings in SAP Datasphere.

In case you do not want to use default automated partitioning (automated partitions is the SAP recommended way) for tables and rather define your own partition logic, you can do that via adjusting the parameter “RMS_MAX_PARTITIONS” (link). In case you enter the value for this parameter, the automated partitioning is not used and overwritten by your defined partition number (which is the maximum number of partitions being used). Please note that this value of “RMS_MAX_PARTITIONS” is valid for all tables in your Replication Flow and is being applied during the deployment process of your Replication Flow.

Parallelization during delta load

In case the default of 1 partition = 1 replication thread per table for the delta load phase is not sufficient to handle the delta volume in your scenario, you can configure more than 1 parallel delta partition for the delta load phase.

This setting needs to be configured in the following way:

  • By using parameter in the SAP ABAP backend system which is connected as your SLT system against Datasphere

 

For way 1) please open the Replication Flow modelling UI in the Data Builder application in SAP Datasphere. Here, open the source system settings once you have defined your source & target connection as well as the required data sets.

DanielIngenhaag_6-1730964697163.png

Use the parameter “Object Thread Count for Delta Loads” to increase the number of partitions that should be used during the delta load phase (default:1 partition). If you change the number of delta partitions here, it will be applied for all replication objects in your Replication Flow (that use load type “initial+delta”, but you can also set the value to an individual value per replication object as well (see second screenshot below)).
With the second parameter “Overwrite Source Settings at Object level”, you can overwrite the settings that you defined per object with this global value that you defined in the source settings configuration.

DanielIngenhaag_7-1730964697165.png

You can access the replication object level configuration by selecting a replication object and then check the property panel on the right-hand side:

 DanielIngenhaag_8-1730964697181.png

As mentioned earlier, you can also follow approach 2) using the SLT advanced replication settings in transaction LTRS using the configuration parameter “Number of Ranges”. This also allows a flexible configuration on table-level for tables where a high delta change volume is expected and therefore more than the default of 1 delta partition should be used.

DanielIngenhaag_9-1730964697187.png

Please keep in mind that this setting needs to be adjusted before the Replication Flow is started, otherwise the change will not be applied.  

Below we are trying to illustrate the most important parameters including the relationship & dependency between the parameters in the SAP Business Suite system and SAP Datasphere:

DanielIngenhaag_10-1730964755963.png

 

Specific considerations for ODP-based replication from SAP Business Warehouse systems & other SAP systems supporting ODP

There are different considerations in the context of replicating source data sets using Replication Flows via the ODP interface (ODP_BW and ODP_SAPI context) from SAP Business Warehouse systems as well as other SAP systems supporting the ODP interface.

ODP package Size specification

Similar to other replication technologies from SAP, Replication Flows also offer the ability to allow users to configure the ODP package size, which can also influence the performance of your Replication Flow. The default value for the package size is currently 100MB, which can be decreased or increased by the user to tune the performance. Please be aware that you need to be careful when increasing this parameter as this could lead to memory problems of your Replication Flow Job in SAP Datasphere.

ODP Partition configuration

The partition handling when replicating data from ODP data sources is a little bit different compared to SLT and CDS where an improved automated handling has been provided in the past. For ODP partition handling, there is a separate parameter called “ODP_RMS_PARTITIONS_LOAD”. Per default, 3 partitions are set through which data can be loaded in parallel in your Replication Flow and will be applied for all ODP data sets you have in your Replication Flow during the deployment process. A user can change the default of 3 partitions to a higher value to chunk the data into more partitions. Please be aware that the partition is only taking care of the action to partition the data, but the end-2-end performance is then primarily also depending on assigning appropriate number of Replication Threads in your Replication Flow.

Both parameters can be set using transaction SM30 in your SAP system and the maintenance view “LTBAS_RUNTIME” for SAP Business Warehouse & SAP Business Suite systems as well as the maintenance view “DHBAS_RUNTIME” for SAP S/4HANA on-premise based systems:

DanielIngenhaag_11-1730964778000.png


Hints & Tips

Below we are trying to provide some hints and tips around performance such as how can you find out a potential performance bottleneck in your replication scenario as well as a summary table on actions you can perform in order  to influence the performance of your Replication Flow.

Checklist in case you observe a slow performance for your data replication scenario:

  1. Are there additional replication flows running in your Datasphere tenant that are occupying replication flow jobs, especially during initial load?

  2. Do you have a sufficient amount of data integration blocks assigned to your Datasphere tenant in the Datasphere tenant configuration ?

  3. In case you load from an SAP ABAP-based source system, have you implemented all known SAP Notes via the Note analyzer?

    Note: Certain improvements are delivered via SAP notes & also existing bugs might impact the performance, which is why it is always recommended to patch your system frequently.

  4. Is the source and / or target system showing any bottleneck situation due to other processes that occupy too many resources in the source / target system?

  5. When using SLT-table or CDS View replication, how does the data buffer look like? Is the data buffer empty or almost full all the time?

  6. When using SLT-table, CDS View or ODP based data extraction, how many processes does the system have available for working on data replication tasks?

  7. When using ODP as data source, what parallelization and package size did you define for data replication during initial load?

  8. In case you observe a bottleneck during the delta load phase for SLT-Table and CDS View data sources, have you assigned multiple partitions for the delta load in case you expect a large delta volume for certain data sets?

  9. In case you replicate data to SAP Datasphere as target, available in our SAP Datasphere instance to handle the workload? Is there any peak memory or CPU in your Datasphere instance or your space where the replication flows are created?

 

How can I analyze where a potential bottleneck is existing when replicating from CDS or SLT?

When replicating data from CDS Views or via tables by leveraging SLT, a concept called “data buffer” is being used in the connected SAP ABAP source system. This data buffer is mainly used for achieving resiliency during the replication process to allow the source system to resend data packages that were not fully processed in case of an outage. The data buffer is the place where the Replication Flow in SAP datasphere is picking up the data package during the replication process (initial as well as delta load phase). This data buffer also becomes a good opportunity to monitor the performance throughput of the data replication. to identify whether the SAP ABAP system or SAP Datasphere Replication Flow is the bottleneck of your data replication, especially during the initial load process of your replication. For ODP-based replication other ODP internal capabilities other than the data buffer is being used to achieve resiliency in this scenario.

DanielIngenhaag_12-1730964828552.png

The ”data buffer” is responsible to receive & organize the prepared data packages in the SAP ABAP source system, which will then be picked up by the Replication Flow. Therefore, several numbers such as the number of total packages as well as the number of “packages ready for the pick-up” are being displayed in the data buffer. These numbers allow a user to have a first check to see where a potential performance bottleneck in the data replication could be located. As illustrated in the picture above, there could be two possible cases when checking the data buffer:

  • Case A: The data buffer is almost empty, which means the SAP ABAP source system is not fast enough to produce the data packages. Here an indication would be that the SAP ABAP system itself is potentially the bottleneck and you would need to increase the number of S/4HANA data transfer jobs or SLT data load jobs to speed up the processing of data packages into the buffer.
    This scenario can for example happen in case you plan to replicate very large data sets or many data sets in parallel, where the default job settings are not sufficient.

  • Case B: The data buffer is almost full and the status of the packages is “ready”, which means that the Replication Flow is not picking up the data fast enough from the data buffer. Therefore, the potential bottleneck is not the SAP ABAP system but rather the Replication Flow in SAP datasphere. A user can therefore check the Replication Flow settings in SAP Datasphere to increase the performance throughput (e.g. increasing the number of Replication Threads, reduce number of parallel running Replication Flows in the same tenant etc.). The following screenshot illustrates an example where the buffer is almost full:

    DanielIngenhaag_13-1730964859933.png
  • Case C : The data buffer is almost full and status of the packages is “In Process”. This means that the source ABAP system is pushing the data into the buffer and Replication Flows are directly & immediately picking up the data. In such a case, you can try to achieve a higher performance throughput by increasing the buffer size for this particular data set, where Replication Flows would allow a higher degree of parallelization.


    Note
    : Before explaining step by step the way how you can adjust the buffer size, please check also the implementation of the following SAP Note where improvements for an automated adjustment for the buffer size has been provided by SAP for SLT-based replication scenarios (
    link). The release of the same functionality for CDS View based replication from SAP S/4HANA will also be made available soon, where I will keep the blog updated once it is available.

To adjust the buffer size, please follow the steps below:

  • Open Transaction DHRDBMON /LTRDBMON and choose the Expert function

    DanielIngenhaag_14-1730964911915.png

  • Enter the name of the buffer table using F4 help function and choose “Change Settings” button

  • Now you can change the max. amount of records to be stored in the buffer table as well as the individual package size. The changes will be applied and take effect immediately!

    DanielIngenhaag_15-1730964911922.png
    Note: It is recommended to not change the individual package size and to choose a multiple of the package size for the buffer table size. (
    link)

    Note: The default buffer size is three packages per partition for each data set.

The data buffer can be accessed via transaction “DHRDBMON” for SAP S/4HANA on-premise based source systems as well as transaction “LTRDBMON” for SLT based source systems where each source data set (e.g. a CDS view or a table) has its own entry in the data buffer:

DanielIngenhaag_16-1730964911944.png

To get a better understanding we have included a sample screenshot of transaction DHRDBMON:

DanielIngenhaag_17-1730964911984.png

Important Parameters in transaction DHRDBMON & LTRDBMON for analyzing the performance are:

  • Maximum Buffer Records
    Describes the maximum limit of records that can be stored in the buffer and is calculated automatically based on the size of the data set.

  • Package Size
    Describes how many records are stored in one package that will be replicated from source to target.

  • Current Number of Records
    Shows the actual number of records that are stored in the data buffer, which is an important parameter to check for a potential bottleneck as de described for Case A & Case B by comparing it with the “Maximum Buffer Records” value.

  • Number of Packages Ready
    Displays the number of data packages that are ready to be picked up by the Replication Flow from the data buffer of this data set (here a CDS View).


Note:
More information can be found using the in-product assistant within you SAP S/4HANA or SLT system to understand what each component in the transaction is displaying.

Approach & Configurations

Information & links

Increase the number of Replication Flows.
For example, don't replicate all tables as tasks within one single Replication Flow, but distribute the tables (tasks) in multiple Replication Flows.

Increasing the number of Replication Flows will consume additional back-ground processes in the source/ target system per Replication Flow.

Requires more Datasphere node hours (Data Integration Blocks) as more Replication Flow jobs are started for each Replication.

Scale them carefully to not overload source/ target system with background processes. (link)

Increase the number of replication threads within a Replication Flow to get more threads and, equivalently, more Replication Flow jobs

 

With the start of multiple threads to the source/ target system, background processes get blocked for the handling of the data packages.

Scale the number of replication threads carefully to not block too many resources in the system.

This change may affect the consumption of Datasphere node hours (Data Integration Blocks) as Replication Flow are started.
Please note that you need to set the threads on both source and target system configuration and ideally using the same number for both (link).

SAP ABAP-based configurations
SAP S/4HANA specific configurations for CDS View-based extraction

Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of available Data Transfer Jobs in your SAP S/4HANA system (link).

SAP ABAP-based configurations
SLT specific configurations for table-based extraction from SAP Business Suite systems

Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of your SLT mass transfer ID for the amount of jobs as well as the configuration of a parallel access plan calculation (link + SLT Sizing guide).

SAP ABAP-based configurations
Specific configurations for ODP-based extraction

Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of the package size as well as partitions being used for replicating data via ODP (link).

 

Conclusion  

That’s it! We hope that this blog is helpful for you to understand the topic around performance of your Replication Flow and provides you with some hints & tips how you can influence the performance of your data replication using Replication Flows in SAP Datasphere. Please keep in mind that performance is usually a very diverse topic that is deviating depending on your use case as well as infrastructure.

A big thank you to the colleagues from SAP Datasphere product & development team who helped in the creation of this blog.

Thank you very much for reading this blog and please feel free to provide feedback or questions into the comment section! 

 

Important Links

Central SAP Note for overview & pre-requisites of integrating SAP ABAP Systems in Replication Flows (Link)

SAP LT Replication Server Performance Tuning Guide (Link)

Important considerations and limitations using Replication Flows in SAP Datasphere (Link)

Replication Flows step by step in SAP Datasphere product documentation (Link)

SAP Datasphere Roadmap Explorer for upcoming innovations

SAP Datasphere Influence portal for submitting new improvement ideas

 

4 Comments