This blog is part of a blog series from SAP Datasphere product management with the focus on the Replication Flow capabilities in SAP Datasphere:
Replication Flow Blog Series Part 1 – Overview | SAP Blogs
Replication Flow Blog Series Part 2 – Premium Outbound Integration | SAP Blogs
Replication Flows Blog Series Part 3 – Integration with Kafka
Replication Flows Blog Series Part 4 – Sizing
Replication Flows Blog Series Part 5 – Integration between SAP Datasphere and Databricks
Replication Flow Blog Series Part 6 – Confluent as Replication Target
Data Integration is an essential topic for an enabler of a Business Data Fabric like SAP Datasphere. Replication Flow is the cornerstone to fuel SAP Datasphere with data, especially from SAP ABAP-based sources. There is also a big need to move enriched data from SAP Datasphere into external environments to succeed certain use cases.
In this part of our Replication Flow blog series, we dive deeper into the performance topic when using Replication Flows including the aspects which factors are influencing the performance. Furthermore, I will also explain the available configurations for a user to influence the performance throughput of a Replication Flow including some specific aspects when using SAP ABAP-based source systems.
Important note: Before reading this blog, you need to understand the different terminologies that we described in part 4 of the blog series. If you have not yet read it, please do it before continuing with this blog!
Additionally, please also take a look on known limitations & important considerations for your scenario, e.g. e.g. here for Replication Flow in general and here for SAP ABAP-based source systems.
Topics that will be covered in this blog:
In the table below, we provide a high-level overview via “quick sizer” based on the performance assumptions we provided in Part 4 of our blog series to illustrate the max. achievable throughput for three different configurations using the number of max. parallel jobs (= Replication Flow Jobs):
* based on the average throughput is 520 million cells / hour per initial load replication thread. In the first example using 1 RF job: 520 mio cells / h * 10 replication threads = 5.200 mio cells / h
** based on the average throughput of 260 mio cells / hour per delta replication thread. In the first example using 1 RF job: 260 mio cells / h * 10 replication threads = 2.600 mio cells / h
All numbers above are based on the scenario to replicate data via CDS Views from an SAP S/4HANA based source system into SAP Datasphere as target system and only illustrate sample values. Depending on your individual system landscape and system specific parameters, real numbers might deviate when you replicate the data in your own environment!
For the initial load phase, we assume an average throughput of 520 million cells / hour per initial load replication thread. In the first example above this would sum up in = 520 million cells / hour per replication thread * 10 replication threads = 5.200 mio cells / hour during the initial load phase by using 2 parallel jobs for a Replication Flow (= default configuration).
During the delta load phase, we assume an average throughput of 260 mio cells / hour per delta replication thread. In the first example above this would sum up in: 2 RF jobs: 260 mio cells / h * 10 replication threads = 2.600 mio cells / h during the delta load phase by using 2 parallel jobs for a Replication Flow (= default configuration).
The following table illustrates some additional average performance throughput figures depending on the source and target system combination, where the number shown is the throughput in million cells per replication thread. This benchmark for performance throughput per replication thread was measured when using a total assignment of the default 10 threads for this Replication Flow:
Target Systems | SAP HANA Cloud & SAP Datasphere | Google BigQuery | Object Store |
Source Systems |
|
|
|
SAP HANA Cloud | 840 million cells /h | 680 million cells /h | 720 million cells /h |
SAP S/4HANA (CDS Views) | 670 million cells /h | 670 million cells /h | 680 million cells /h |
SAP S/4HANA | 730 million cells /h | No figures available | No figures available |
Important Note: Especially when we talk about integration from SAP ABAPbased source systems, please make sure that your system is up-to-datevia installing the latest version of the as well as the implementation of all known SAP notes (incl. TCI notes) for SAP S/4HANA systems (on-premise and private cloud edition) so that can make use of the latest features including performance improvements. Please check our central SAP Note as well as the Note Analyzer for details if your system is up-to-date. We recommend to regularly check for new SAP Notes in a certain frequency, e.g. every 2-3 weeks.
Important to know is that the performance of a defined Replication Flow by a user in SAP Datasphere heavily depends on the number of assigned Replication Threads and the overall availability of Replication Flow Jobs in your SAP Datasphere tenant, but also other factors that we will describe in more detail in the next section.
As already briefly mentioned in the paragraph above, there are several aspects that can influence the performance throughput of data replication in general. We tried to group the different influencing factors into different categories. Please note that the list is not complete, but it highlights some of the important factors that are typically influencing the performance throughput of a data replication scenario.
Category 1: Infrastructure specific factors such as:
Category 2: System specific factors for source and target system(s) such as:
Category 3: User specific configurations for influencing the data replication such as:
Number of assigned data integration hours and max. parallel jobs in your SAP Datasphere tenant configuration
Note: The number of assigned max. parallel jobs are shared between all replication flows in the SAP Datasphere tenant.In the following sections we will further focus on category 3. This includes the configurations that are available to a user to influence the performance. Furthermore, there are some detailed information when using SAP ABAP-based source systems.
Before we now dive deeper into the user-specific configurations for influencing the performance of a Replication Flow, we will quickly describe important parameters and terminologies including the default settings for all involved components.
Important Note: Please note that the table below includes the latest default values, which is especially important for SAP ABAP-based source systems (incl. SAP S/4HANA, SAP Business Suite / SAP ECC or SAP Business Warehouse systems). There might be different default values depending on the individual release state of your SAP ABAP-based system. Thus, please make sure your SAP ABAP-based system is up-to-date and contains all latest SAP Notes (link)!
| Default Value | Location of Setting |
Replication Flows & Replication Objects | There is no default value of how many Replication objects are created inside a Replication Flow. A user needs to add a minimum of 1 Replication Object to a Replication Flow with an overall maximum of 500 Replication Objects that can be added to a single Replication Flow. | SAP Datasphere |
Replication Flow Jobs | Per default, each SAP Datasphere tenant has a maximum of 2 Replication Flow Jobs assigned, which can be increased via the tenant configuration by assigning more data integration blocks (link). | SAP Datasphere |
Replication Threads (Initial Load) also known as “Source Thread Limit for Initial Load” | Per default, 10 Replication Threads are assigned per Replication Flow for the initial load phase, which can be changed by a user (). Please note that the replication threads need to be configured in the source as well as target connection settings and that the technical limit is always (technically) limited by the number of max. Replication Flow jobs in the tenant | SAP Datasphere |
Data Transfer Threads (Delta Load) also known as “Object Thread Count for Delta Loads” | Per default, 1 Replication Thread is assigned per Replication Object for the delta load phase. For SAP S/4HANA on-premise using CDS view-based replication and table-based replication from SAP source systems by leveraging SLT, users can increase the number of partitions to be used during the delta load phase using the parameter “Object Thread Count for Delta Loads” in the replication flow modelling UI (link). | SAP Datasphere |
Partitions | For SLT-table & CDS View-based data replications, the number of partitions is automatically being calculated for you depending on the size of the data set you want to replicate For ODP-based data replication, a default value of 3 partitions is being used that a user can overwrite in the SAP ABAP source system to achieve a higher parallelization by configuring a parameter in the SAP ABAP backend (link). The partitioning for ODP is introduced via the following Note (link). | SAP ABAP |
Partitions | For table-based replication of supported database source systems (like SAP HANA database, or Azure SQL source systems), an automated partitioning is being performed depending on the size of the data set you are replicating in your scenario. There is currently no way for a user to change this handling in these source systems. | Azure SQL / HANA source |
Package Size (SAP ABAP source system using ODP)
| Per default a package size of 100 MB is being used when transferring data from ODP source data sets, which can be changed by user via a parameter (link) assuming you have implemented the following SAP Note (link). Note: Without this SAP Note being implemented, increasing the value manually, because it can lead to out of memory errors for your Replication Flow Jobs. | SAP ABAP |
Maximum sessions allowed for SAP ABAP based source systems | If your system is up-to-date, the session check is being disabled (recommended approach) for Replication flows (link). In the past, in case the session limit has not been disabled, the default value was a maximum of 10 sessions. | SAP ABAP |
Now it’s time to tackle the question “How can a user influence and optimize the throughput of a Replication Flow within SAP Datasphere?”
This question can be answered by four different actions that we will further outline below:
Configuration of Partitions
You may wonder how partitions are being determined for my Replication Flow that I see in the Data Builder?
Replication Flows show the number of partitions in the Data Integration Monitor in SAP Datasphere. Partitions are being used by Replication Flows to divide the amount of data to be replicated in smaller chunks, which can then be replicated in parallel depending on the number of assigned replication threads for initial load. The number of partitions for the initial load is automatically determined by the respective Replication Flow source connection. During the delta load, the number of (delta) partitions is per default set to 1 for all Replication Flow source connections and can be changed for specific sources (will be explained later).
As mentioned earlier, in case you are replicating data from database-based source systems, the partition handling is automated based on the size of the data set and can currently not be influenced by a user.
For SAP ABAP-based sources, the partitions for CDS View as well as table-based extraction via SLT are also calculated automatically based on the number of available access plans that are available in the SAP ABAP-based source system. The number of partitions can be changed & limited by a user to use a custom partitioning via RMS_MAX_PARTITIONS parameter. In case of using ODP data sources from SAP ABAP based systems, you currently have a default partitioning of 3 partitions per data set, which can be changed by a user via a parameter in the connected SAP ABAP-based source system (see further details in the following paragraphs).
Assignment of Replication Threads for initial load via thread limit
The topic around replication threads is also linked to the partitions that we explained in the previous chapter. Ideally Replication Flows can transfer every partition (initial or delta) concurrently, but however, that can overwhelm the connected source and target systems if there are thousands of partitions. For example, you have a Replication Flow with 100 Replication Objects each having 3 initial load partitions. Then you would have 300 partitions for this Replication Flow that can transfer data.
Therefore, the idea is to provide a configuration property for the end-user to control the concurrency or in other words the degree of parallelization to replicate the data via the “source Replication thread limit” and “target Replication thread limit”. Hence the underlying service responsible for replicating the data will limit the threads to source & target systems to the value defined by the user. These source & target threads are used not only for data transfers (initial and delta), but also for housekeeping activities (such as setup, initial partitioning, delta partitioning, teardown, etc.).
When creating your Replication Flow, you can set the number of replication threads (aka thread limit for initial load) to control the degree of parallelized data replication during the initial load phase for all Replication Objects that you have added to your Replication Flow. For those of you who already worked with Replication Flows in SAP Data Intelligence , this parameter is the same parameter as “Source Maximum Connections” and “Target Maximum Connections” (
Note: If you want to change the number of Replication Threads for existing Replication Flows, e.g. in order to fine tune the performance of the initial load time, you can do this also for running Replication Flows without restarting the load process.
Important: The available Replication Flow jobs that are assigned via Tenant Configuration are shared by all Replication Flows running across your entire SAP Datasphere tenant!
To increase or decrease the number of assigned replication threads, you need to go into the Data Builder application in SAP Datasphere and open your desired Replication Flow. The number of Replication Threads includes threads for:
Here, you need to open the settings for both the source as well as the target connection to open the configuration dialog:
* Note: For transferring data, each replication flow job has different types of data transfer threads depending on the type of source system. That means for example each Replication Flow Job has:
This is then especially getting important for the performance throughput if you use two different kinds of source systems in your replication flows that can share the same replication flow job, which is finally also depending on the overall number of available replication flow jobs.
For the source system configuration, the dialog should look like this where we see the default assignment of 10 Replication Threads as limit:
For the target system configuration, the dialog should look like this where we see the default assignment of 10 Replication Threads as limit:
Ideally, both settings should have the same value assigned to allow the optimal throughput. Entering different values for source and target Replication Threads would very likely lead to a bottleneck for the end-2-end performance of your Replication Flow. Additionally, you might also consider increasing or decreasing the number of Replication Threads in steps of “5”, e.g. 10, 15, 20 etc., to get an equal number of Replication Flow Jobs assigned to your Replication Flow. The reason for this is that you typically have 5 data transfer threads per Replication Flow Job and therefore you can get the number of Replication Flow jobs by dividing the Replication threads by 5.
Note: As previously mentioned, the overall number of threads available in your SAP Datasphere is tenant is derived from the assigned number of “maximum parallel jobs” in your tenant configuration (link). For details, please check out the information in part 4 of this blog series (link).
Furthermore, you need to make sure that also your connected source and target system provide sufficient resources to replicate the data in the configured parallelization, e.g. number of dialog processes in your SAP ABAP source system.
Assignment of Object Thread Count for Delta Loads:
This new Replication Flow configuration property allows you to define how many of the replication threads can be used for parallel delta transfers for a single Replication Object in case you replicate data from CDS Views out of SAP S/4HANA systems or tables from SAP ABAP-based systems (leveraging SLT). The minimum number is 1, the maximum number is 10. In case you have assigned less than 10 threads for the initial load, the maximum value of the parameter is limited to the number of threads defined for initial load.
If you want to adjust the number of partitions for delta load, e.g. in case you are expecting a high change data volume, you can use a dedicated parameter in the source system settings of your Replication flow for the cases where you either want to replicate CDS views or tables leveraging SLT. We will dive a little deeper into this topic in the following paragraphs when we go into some CDS & SLT specific considerations.
Number of Replication Flows incl. number of assigned Replication objects
As mentioned earlier in this blog, you can add up to 500 Replication Objects in a single Replication Flow, which all share the number of assigned replication threads during the initial as well as delta load phase. But in reality, you probably will not assign such a large number inside a single replication flow and you might assign data sets to a replication flow based on certain conditions (e.g. volume of each data set, desired frequency for delta load etc.) which we will further describe at a later point in time.
If you want to achieve a certain performance for very large objects where you also want to ensure that a certain number of replication threads should exclusively being used for these Replication objects, you can divide your overall number of data sets that need to be replicated among several Replication Flows to increase the throughput by having more Replication Threads to replicate the data OR by assigning more Replication Threads to your existing Replication Flow to achieve a higher throughput.
The overall number of threads (derived from the assigned number of Replication Flow Jobs in the tenant configuration) set on the SAP Datasphere tenant level can be distributed to different replication flows based on your needs for throughput on the individual replication objects. For example, group a few objects with a high data volume for initial load into a replication flow having a high number of threads for a better initial load performance. Once the load is done, the number of threads can be reduced. Another example is grouping all tables with only little delta and assigning only a small number of threads to save resources.
All in all, both options are depending on the assigned maximum number of Replication Flow Jobs via the max. parallel jobs in your SAP Datasphere tenant configuration. The following table explains two basic approaches including its consequences. In another chapter at the end of the blog we also go little bit more in detail to check for possible performance improvements when replicating data via CDS Views or tables leveraging SLT.
Approach | Consequences |
Increase the number of replication flows. | Increasing the number of replication flows will consume additional back-ground processes in the source/ target system per replication flow. Requires more Replication Flow jobs for each replication flow limited by the number of overall available Replication Flow jobs in the SAP Datasphere tenant incl. their existing utilization from other Replication Flows. Scale them carefully to not overload source / target system with additional processes and impacting existing replication flows, especially during the initial load phase. |
Increase the number of threads within a replication flow to get more threads and, equivalently, more Replication Flow Jobs | With the start of multiple connections to the source/ target system, background processes get blocked for the handling of the data packages. Scale the number of threads carefully to not block too many resources in the system, especially during the initial load phase of your replication. |
Considerations when tuning performance of SAP ABAP-based sources
When looking into the integration of different SAP systems, you have different interfaces & data sets that can be replicated depending on which system we are talking about, like an SAP S/4HANA system, an SAP Business Suite system (e.g. SAP ECC) or an SAP Business Warehouse system (e.g. SAP BW/4HANA). First, let us take a look on some general aspects that are relevant for SAP ABAP-based source systems and then go into each of the individual aspects of the three different SAP ABAP integration scenarios (CDS, SLT and ODP). Some additional information about performance when using Replication Flows with SAP ABAP sources can also be found here: https://me.sap.com/notes/3360905/E.
In this SAP Note, the concept of the data buffer is being explained for cases where you want to extract data via CDS Views or table-based replication leveraging SLT (for ODP sources this concept is not applicable!).
In summary, the source system is pushing the data into the data buffer where Replication Flows are picking up the data from the data buffer. Main goal is in this context to achieve a resilient data transfer by using a commit mechanism that indicates if a certain data package has been fully written to the target system by a replication flow or, in case of error scenarios, a data package needs to be sent again to the target system in case it has not been fully transferred. The following picture illustrates the base concept of the data buffer:
As the data buffer is a key component between transferring the data from the source to the target system, it also is an important factor when looking on the end-2-end performance of a replication flow in case you replicate data from an SAP ABAP source system.
Sample factors that influence the performance throughput (as explained in the SAP Note):
Note: The data buffer has a size limit to make sure the buffer tables do not consume too much storage / space and once the limit of a buffer is reached, there will be no more data pushed into the buffer until there is new space inside the buffer available.
The parameter “APE_MAX_SESSIONS” is an important aspect that needs to be considered for all three integration scenarios (CDS, SLT and ODP). This parameter was initially introduced when using Data Flows in SAP Datasphere and Pipelines in SAP Data Intelligence. It acts like a “safety belt” through which you can control how many parallel sessions can be used by a integration tool (like a Data Flow for example) to replicate data depending on your available number of background and dialog processes in the SAP ABAP system.
In case your system is up to date with the latest SAP Notes, this parameter is disabled per default for Replication Flows only (= recommended approach) with one of the latest SAP Notes (link), but this parameter was used in the past with a default value of 10 for Data Flows as well as Replication Flows in SAP Datasphere and therefore we wanted to mention the old behaviour in this context.
You can find more information in the following SAP Note. In case you still want to use this parameter for Replication Flows, you can roughly calculate the number of sessions by = 2,5 * the number of partitions you are using within your replication flow. Nowadays we typically recommend disabling this parameter as the calculation of expected sessions is not that easy and also has an impact on the performance throughput if the value is not configured correctly.
You can find more information in the following SAP Note. In case you still want to use this parameter for Replication Flows, you can roughly calculate the number of sessions by = 2,5 * the number of partitions you are using within your replication flow. Nowadays we typically recommend disabling this parameter as the calculation of expected sessions is not that easy and also has an impact on the performance throughput if the value is not configured correctly.
Specific considerations for CDS View-based replication from SAP S/4HANA systems
There are different considerations in the context of replicating data via CDS Views from SAP S/4HANA based systems, which partially also differ depending on the deployment type of your system (SAP S/4HANA public cloud vs. SAP S/4HANA on-premise + SAP S/4HANA private cloud).
Access Plan Calculation
Is the phase in which the replication from CDS Views is being prepared. The access plan calculation in this context is created automatically by the SAP system using an SAP owned logic. There is currently no way for a user to influence the access plan calculation in both SAP S/4HANA on-premise as well as SAP S/4HANA Public Cloud systems.
Data Transfer Jobs
The data transfer jobs are responsible for loading the data into the data buffer in SAP S/4HANA where the data packages are being picked up by the Replication Flow. By default, minimum and maximum number of data transfer job is set to 1, which means 1 data transfer job is running per default. If this is not sufficient, a user can change the number of data transfer jobs using transaction SM30 and maintenance view “DHCDC_JOBSTG” (General Job Settings of the CDC Engine) or by directly using transaction “DHCDCSTG” using the following instructions under this link for SAP S/4HANA on-premise based systems:
Partitioning
In the latest versions of SAP S/4HANA on-premise as well as in SAP S/4HANA public cloud, there is a logic to automatically calculate the number of partitions per CDS view depending on the size of each individual CDS View. Partitioning describes the mechanism to slice one data set into different smaller pieces, which can then be replicated to the target system with a certain degree of parallelization. The degree of parallelization for the data transfer is then mainly defined by the number of assigned “Replication Threads” in your Replication Flow settings in SAP Datasphere.
In case you do not want to use default automated partitioning (where using the automated partitioning is the recommended way) for CDS Views and rather define your own partition logic, you can do that via adjusting the parameter “RMS_MAX_PARTITIONS” (link). In case you enter the value for this parameter, the automated partitioning is not used and overwritten by your defined partition number (which is the maximum number of partitions being used). Please note that this value of “RMS_MAX_PARTITIONS” is valid for all CDS Views in your Replication Flow and is being applied during the deployment process of your Replication Flow.
Note: If you want to use different partition values for different CDS Views, you need to change the value multiple times.
Parallelization during delta load for CDS Views
In case the default of 1 partition = 1 replication thread per CDS View for the delta load phase is not sufficient to handle the delta volume in your scenario, you can configure more than 1 parallel delta partition for the delta load phase.
This setting can be changed in two ways:
For way 1) please open the Replication Flow modelling UI in the Data Builder application in SAP Datasphere. Here, open the source system settings once you have defined your source & target connection as well as the required data sets.
Use the parameter “Object Thread Count for Delta Loads” to increase the number of partitions that should be used during the delta load phase (default 1: partition). If you change the number of delta partitions here, it will be applied for all replication objects in your Replication Flow (that use load type “initial+delta”, but you can also set the value to an individual value per replication object as well (see second screenshot below).
With the second parameter “Overwrite Source Settings at Object level”, you can overwrite the settings that you defined per object with this global value that you defined in the source settings configuration.
You can access the replication object level configuration by selecting a replication object and then check the property panel on the right-hand side:
As mentioned earlier, you can also follow approach 2) by adjusting the parameter in the SAP S/4HANA backend system. To do so, open the maintenance view “DHCDC_RMSOBJSTG” using transaction SM30 (please check also the following link). Per default, the value 1 is used, which means 1 delta partition per Replication object. You can increase the parameter in case you want to use more than maximum 1 delta partition. In the maintenance view you can individually maintain on a single CDS View level, for which CDS Views you want to use a delta partition that is greater than 1. For each CDS View where no entry is specified in the table, it will automatically use the default value of 1.
Additionally, there is always the rule that configuration in the UI is the leading value and overwrites potential configurations in the source system!
In the following example we have increased the number of delta partitions for:
At the moment you cannot dynamically change this parameter for a running Replication Flow, which is a topic for a future enhancement of the functionality to allow dynamic and flexible changes during run-time without resetting your replication.
Performance optimizations based on optimized data conversion
Recently with a new SAP Datasphere release + implementation of SAP Note in your SAP ABAP-based system (link), an improvement of the data conversion has been implemented, which can also influence the performance of your CDS View replication (especially for larger tables). Please note that the usage of this feature is depending on your system, i.e. the SAP S/4HANA version + the provided SAP note link above as well as a prerequisite in the SAP kernel to allow fast serialization (link). If your system supports it, it will automatically be used and if it is not supported, it will switch back to the old data conversion without using the fast serialization that provides better performance.
Below we are trying to illustrate the most important parameters including the relationship & dependency between the parameters in SAP S/4HANA and SAP Datasphere:
Note: The Axino component is a technical service running in Datasphere who is manging all requests in case you use any SAP ABAP-based connections in your Replication Flow or Data Flow. Whereas in SAP Data Intelligence Cloud administrators could influence the sizing & resources of this service, in SAP Datasphere the scaling of Axino is handled automatically by SAP.
Specific considerations for table-based replication from SAP Business Suite systems leveraging SLT:
There are different considerations in the context of replicating data via tables using SLT from SAP Business Suite (e.g. SAP ECC) based systems.
Access Plan Calculation
Is the phase in which the replication from the tables is being prepared and is happening before the actual data replication is starting. Per default, there is one access plan calculation running per source table, but there are options to parallelize the access plan calculation (SAP Note + Documentation). This becomes especially important for large data sets, where the “single threaded” access plan calculation without parallelization can take some time, e.g. 1 day.
Number of SLT jobs
When creating your SLT configuration, you need to specify a certain number of jobs for your data replication scenario including:
In order to set an appropriate number of jobs for your scenario, please check out the existing and detailed SLT documentations:
Partitioning
In the latest versions of SAP Business Suite systems (e.g. SAP ECC) there is a logic to automatically calculate the number of partitions per table depending on the size of each individual table. Partitioning describes the mechanism to slice one data set into different smaller pieces, which can then be replicated to the target system with a certain degree of parallelization. The degree of parallelization for the data transfer is then mainly defined by number of assigned “Replication Threads” in your Replication Flow settings in SAP Datasphere.
In case you do not want to use default automated partitioning (automated partitions is the SAP recommended way) for tables and rather define your own partition logic, you can do that via adjusting the parameter “RMS_MAX_PARTITIONS” (link). In case you enter the value for this parameter, the automated partitioning is not used and overwritten by your defined partition number (which is the maximum number of partitions being used). Please note that this value of “RMS_MAX_PARTITIONS” is valid for all tables in your Replication Flow and is being applied during the deployment process of your Replication Flow.
Parallelization during delta load
In case the default of 1 partition = 1 replication thread per table for the delta load phase is not sufficient to handle the delta volume in your scenario, you can configure more than 1 parallel delta partition for the delta load phase.
This setting needs to be configured in the following way:
For way 1) please open the Replication Flow modelling UI in the Data Builder application in SAP Datasphere. Here, open the source system settings once you have defined your source & target connection as well as the required data sets.
Use the parameter “Object Thread Count for Delta Loads” to increase the number of partitions that should be used during the delta load phase (default:1 partition). If you change the number of delta partitions here, it will be applied for all replication objects in your Replication Flow (that use load type “initial+delta”, but you can also set the value to an individual value per replication object as well (see second screenshot below)).
With the second parameter “Overwrite Source Settings at Object level”, you can overwrite the settings that you defined per object with this global value that you defined in the source settings configuration.
You can access the replication object level configuration by selecting a replication object and then check the property panel on the right-hand side:
As mentioned earlier, you can also follow approach 2) using the SLT advanced replication settings in transaction LTRS using the configuration parameter “Number of Ranges”. This also allows a flexible configuration on table-level for tables where a high delta change volume is expected and therefore more than the default of 1 delta partition should be used.
Please keep in mind that this setting needs to be adjusted before the Replication Flow is started, otherwise the change will not be applied.
Below we are trying to illustrate the most important parameters including the relationship & dependency between the parameters in the SAP Business Suite system and SAP Datasphere:
Specific considerations for ODP-based replication from SAP Business Warehouse systems & other SAP systems supporting ODP
There are different considerations in the context of replicating source data sets using Replication Flows via the ODP interface (ODP_BW and ODP_SAPI context) from SAP Business Warehouse systems as well as other SAP systems supporting the ODP interface.
ODP package Size specification
Similar to other replication technologies from SAP, Replication Flows also offer the ability to allow users to configure the ODP package size, which can also influence the performance of your Replication Flow. The default value for the package size is currently 100MB, which can be decreased or increased by the user to tune the performance. Please be aware that you need to be careful when increasing this parameter as this could lead to memory problems of your Replication Flow Job in SAP Datasphere.
ODP Partition configuration
The partition handling when replicating data from ODP data sources is a little bit different compared to SLT and CDS where an improved automated handling has been provided in the past. For ODP partition handling, there is a separate parameter called “ODP_RMS_PARTITIONS_LOAD”. Per default, 3 partitions are set through which data can be loaded in parallel in your Replication Flow and will be applied for all ODP data sets you have in your Replication Flow during the deployment process. A user can change the default of 3 partitions to a higher value to chunk the data into more partitions. Please be aware that the partition is only taking care of the action to partition the data, but the end-2-end performance is then primarily also depending on assigning appropriate number of Replication Threads in your Replication Flow.
Both parameters can be set using transaction SM30 in your SAP system and the maintenance view “LTBAS_RUNTIME” for SAP Business Warehouse & SAP Business Suite systems as well as the maintenance view “DHBAS_RUNTIME” for SAP S/4HANA on-premise based systems:
Below we are trying to provide some hints and tips around performance such as how can you find out a potential performance bottleneck in your replication scenario as well as a summary table on actions you can perform in order to influence the performance of your Replication Flow.
Checklist in case you observe a slow performance for your data replication scenario:
How can I analyze where a potential bottleneck is existing when replicating from CDS or SLT?
When replicating data from CDS Views or via tables by leveraging SLT, a concept called “data buffer” is being used in the connected SAP ABAP source system. This data buffer is mainly used for achieving resiliency during the replication process to allow the source system to resend data packages that were not fully processed in case of an outage. The data buffer is the place where the Replication Flow in SAP datasphere is picking up the data package during the replication process (initial as well as delta load phase). This data buffer also becomes a good opportunity to monitor the performance throughput of the data replication. to identify whether the SAP ABAP system or SAP Datasphere Replication Flow is the bottleneck of your data replication, especially during the initial load process of your replication. For ODP-based replication other ODP internal capabilities other than the data buffer is being used to achieve resiliency in this scenario.
The ”data buffer” is responsible to receive & organize the prepared data packages in the SAP ABAP source system, which will then be picked up by the Replication Flow. Therefore, several numbers such as the number of total packages as well as the number of “packages ready for the pick-up” are being displayed in the data buffer. These numbers allow a user to have a first check to see where a potential performance bottleneck in the data replication could be located. As illustrated in the picture above, there could be two possible cases when checking the data buffer:
Case C : The data buffer is almost full and status of the packages is “In Process”. This means that the source ABAP system is pushing the data into the buffer and Replication Flows are directly & immediately picking up the data. In such a case, you can try to achieve a higher performance throughput by increasing the buffer size for this particular data set, where Replication Flows would allow a higher degree of parallelization.
To adjust the buffer size, please follow the steps below:
The data buffer can be accessed via transaction “DHRDBMON” for SAP S/4HANA on-premise based source systems as well as transaction “LTRDBMON” for SLT based source systems where each source data set (e.g. a CDS view or a table) has its own entry in the data buffer:
To get a better understanding we have included a sample screenshot of transaction DHRDBMON:
Important Parameters in transaction DHRDBMON & LTRDBMON for analyzing the performance are:
Note: More information can be found using the in-product assistant within you SAP S/4HANA or SLT system to understand what each component in the transaction is displaying.
Approach & Configurations | Information & links |
Increase the number of Replication Flows. | Increasing the number of Replication Flows will consume additional back-ground processes in the source/ target system per Replication Flow. Requires more Datasphere node hours (Data Integration Blocks) as more Replication Flow jobs are started for each Replication. Scale them carefully to not overload source/ target system with background processes. (link) |
Increase the number of replication threads within a Replication Flow to get more threads and, equivalently, more Replication Flow jobs
| With the start of multiple threads to the source/ target system, background processes get blocked for the handling of the data packages. Scale the number of replication threads carefully to not block too many resources in the system. This change may affect the consumption of Datasphere node hours (Data Integration Blocks) as Replication Flow are started. |
SAP ABAP-based configurations | Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of available Data Transfer Jobs in your SAP S/4HANA system (link). |
SAP ABAP-based configurations | Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of your SLT mass transfer ID for the amount of jobs as well as the configuration of a parallel access plan calculation (link + SLT Sizing guide). |
SAP ABAP-based configurations | Make sure you are not hitting the session limit or disable the session limit (link). Check the configuration of the package size as well as partitions being used for replicating data via ODP (link). |
That’s it! We hope that this blog is helpful for you to understand the topic around performance of your Replication Flow and provides you with some hints & tips how you can influence the performance of your data replication using Replication Flows in SAP Datasphere. Please keep in mind that performance is usually a very diverse topic that is deviating depending on your use case as well as infrastructure.
A big thank you to the colleagues from SAP Datasphere product & development team who helped in the creation of this blog.
Thank you very much for reading this blog and please feel free to provide feedback or questions into the comment section!
Central SAP Note for overview & pre-requisites of integrating SAP ABAP Systems in Replication Flows (Link)
SAP LT Replication Server Performance Tuning Guide (Link)
Important considerations and limitations using Replication Flows in SAP Datasphere (Link)
Replication Flows step by step in SAP Datasphere product documentation (Link)
SAP Datasphere Roadmap Explorer for upcoming innovations
SAP Datasphere Influence portal for submitting new improvement ideas
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
27 | |
25 | |
19 | |
14 | |
14 | |
11 | |
10 | |
9 | |
7 | |
7 |