In my current project, we are syncing two systems using SAP Cloud Integration. Records from 25 tables are sent from the Sender system to the Receiver system. Some of these 25 tables are huge, and the first full load is often time-consuming for those huge tables. In this blog, I'll share the performance improvements that helped reduce the required time for loading the data into the Receiver System.
As-Is Design
Sender System
The sender system would have records in a table. Each record would be assigned a country code and a region code.
For example, the Customer table has records like these:
Id |
Country Code |
Region Code |
Column 3...n |
---|
1 |
GB |
UKC |
... |
2 |
GB |
UKD |
... |
3 |
GB |
UKE |
... |
4 |
GB |
UKF |
... |
5 |
IN |
MH |
... |
6 |
IN |
GJ |
... |
7 |
IN |
PB |
... |
8 |
IN |
DL |
... |
Limitations of Sender System
Sender System had these limitations:
- API only allowed 2000 records in a single synchronous call.
- The sender adapter only created one message at a time, i.e., once the given 2000 records have been processed fully, then only the next batch of 2000 records will be available for processing.
Receiver System
The receiver system is a file system where there is a directory for each country code and under every country code directory, there are directories for region codes. In the region's directory, each object will have its file.
For example, GB and IN directories will represent the United Kingdom and India. GB will have further directories representing each region like UKC for North East, England, UKD for North West, England, etc. Each regional directory will have object/table files like Customers.csv, Invoices.csv, Orders.csv and so on.


Directory Structure
Limitations of Receiver System
The receiver System also has its limitations and they are
- For a given region, only one connection is allowed at a time because the regional business users would use the same system and data load activities should not affect the experience of the business user.
The Flow
In the As-Is Design, we picked up the records from the tables in batches of 2000. Each record would have a country code and region code. The second step is to group the records belonging to the same country and region. For each group of records, a message will be created. Now, all records will belong to a single country and a single region in a single message. Finally, we can determine the file path based on country and region. Once the file path is determined, all that remains is to append the records to the file.
Example
Let's consider these 10 records. I'll use the batch size of 5 for the demonstration.
Step 1: The Records
Id |
Country Code |
Region Code |
Column 3...n |
---|
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
3 |
IN |
GJ |
... |
4 |
GB |
UKC |
... |
5 |
IN |
GJ |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
8 |
IN |
GJ |
... |
9 |
GB |
UKC |
... |
10 |
IN |
GJ |
... |
Step 2: Fetch the Records in batches of 5 sequentially
Here, the records will be fetched in batches of 5. So, Batch 1 will come first in a message. Once Batch 1's processing is complete Batch 2 will be available.
Batch |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
3 |
IN |
GJ |
... |
4 |
GB |
UKC |
... |
5 |
IN |
GJ |
... |
Batch 2 |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
8 |
IN |
GJ |
... |
9 |
GB |
UKC |
... |
10 |
IN |
GJ |
... |
Step 3: Group the Records by Country and Region and create separate message country-region wise for the current batch
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
Message 2 |
3 |
IN |
GJ |
... |
5 |
IN |
GJ |
... |
Step 4: Write records in a given message to the correct folder
Now that message 1 from Batch 1 contains the UKC region of GB, we can write the records to the Customers.csv file in the /GB/UKC directory.
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
Step 5: Perform Step 4 for all messages in a given batch
Step 6: Perform Steps 3 and 4 for all batches sequentially
Step 7: Done
In the end, we end up with 4 calls to the receiver system in total:
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
Message 2 |
3 |
IN |
GJ |
... |
5 |
IN |
GJ |
... |
Batch 2 |
Message 3 |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
9 |
GB |
UKC |
... |
Message 4 |
8 |
IN |
GJ |
... |
10 |
IN |
GJ |
... |
The Bottleneck and the Solution
The bottleneck in this design starts with the limitation of the receiver system. We can only make one connection at a time for a given regional file. And the sender system only provides 2000 records at a time in a single synchronous call. Also, the records can only be written to files sequentially.
💡 We need to make the least connections for a given batch of 2000 records. This can be done if the records are sorted by country and region before the batches are formed. However, sorting takes time and the synchronous call will not be able to hold the connection open while the sender system finishes sorting. So, we switched to the asynchronous call. Fortunately, the asynchronous call has a higher limit of 10000 records per message. Yes! Another win :).
To-Be Design
The To-Be Design simply sorts the records before forming batches. Everything else remains the same.
Example
Let's run through our example records now. Step 1b is new and does the magic.
Step 1: The Records
Id |
Country Code |
Region Code |
Column 3...n |
---|
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
3 |
IN |
GJ |
... |
4 |
GB |
UKC |
... |
5 |
IN |
GJ |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
8 |
IN |
GJ |
... |
9 |
GB |
UKC |
... |
10 |
IN |
GJ |
... |
Step 1b: Sort the records by Country Code and Region Code
Id |
Country Code |
Region Code |
Column 3...n |
---|
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
9 |
GB |
UKC |
... |
3 |
IN |
GJ |
... |
5 |
IN |
GJ |
... |
8 |
IN |
GJ |
... |
10 |
IN |
GJ |
... |
Step 2: Fetch the Records in batches of 5 sequentially
Here, the records will be fetched in batches of 5. So, Batch 1 will come first in a message. Once Batch 1's processing is complete Batch 2 will be available.
Batch |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
Batch 2 |
9 |
GB |
UKC |
... |
3 |
IN |
GJ |
... |
5 |
IN |
GJ |
... |
8 |
IN |
GJ |
... |
10 |
IN |
GJ |
... |
Step 3: Group the Records by Country and Region and create separate message country-region wise for the current batch
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
Step 4: Write records in a given message to the correct folder
Now that message 1 from Batch 1 contains the UKC region of GB, we can write the records to the Customers.csv file in the /GB/UKC directory.
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
Step 5: Perform Step 4 for all messages in a given batch
Step 6: Perform Steps 3 and 4 for all batches sequentially
Step 7: Done
In the end, we end up with 3 calls instead of 4 calls to the receiver system in total:
Batch |
Message |
Id |
Country Code |
Region Code |
Column 3...n |
---|
Batch 1 |
Message 1 |
1 |
GB |
UKC |
... |
2 |
GB |
UKC |
... |
4 |
GB |
UKC |
... |
6 |
GB |
UKC |
... |
7 |
GB |
UKC |
... |
Batch 2 |
Message 2 |
9 |
GB |
UKC |
... |
Message 3 |
3 |
IN |
GJ |
... |
5 |
IN |
GJ |
... |
8 |
IN |
GJ |
... |
10 |
IN |
GJ |
... |
If this blog helped you, please leave a like and please share it with your network. Thank you.
Results and Other Improvements
With As-Is Design the biggest table for a country with 45 regions took approx. 8 hours to run. Yes, 8 hours from end to end. With the small tweak in the design, the To-Be Design took only approx. 37 mins. Sorting took approx. 12 mins and the rest of the process took 25 mins.
This particular test case was an extreme example. With 45 regions, in the worst case, you can have records belonging to all 45 regions in all batches. This means you'll make 45 calls per batch to the receiver system. With the solution to the bottleneck, in the best case, you'll make as many calls as there are batches or 45 calls (one for each region) assuming that each region has at least one record.
While analysing the bottleneck and applying the solution, we came across these other improvements that are more general and applicable in many other scenarios:
- Indexing the source tables
This ensured that minimum time is required for sorting the records before forming the batches
- Enabling the HTTP Session Reuse at Integration Flow Level
The receiver system has a directory like structure, but it is accessed using an HTTP Receiver Adapter. And, the code to connect with the receiver system and append to the file is written in its own Integration Flow. This Integration Flow would receive messages from multiple countries and regions for writing to the file. Enabling the HTTP Session Reuse at the Integration Flow level ensured that the login call would only be made when the session expired.
- Using the GZip compression
Previously, we were using Octet-Stream and sending the data in plain format. Compressing the data reduced the time consumption by 16%.
Summary
Performance improvement requires the identification of bottlenecks and then applying solutions. In the scenario discussed in this blog, sorting the data beforehand improved the performance. Indexing the source tables helped in sorting the data quickly. On the receiver side, reusing the HTTP session and compressing the data helped.
References/Further Readings
Hope this helps,
Bala