This blog post demonstrates, how to improve the performance of DI pipeline by grouping multiple messages into a batch rather than send a single message at a time.
I had a requirement to expose some ECC data to enterprise Kafka topic using SAP Data Intelligence. I developed a custom ABAP operator to read data from ABAP function module and pass it to Kafka in JSON format.
Version of DI - SAP Data Intelligence 3.0
My initial build: Single Json message out
In this build, I had a function module (FM) to export a table of JSON messages and record count. Custom ABAP operator did a loop on the table and sent one message at a time to DI.
Fig1: Build table of JSON messages in ABAP FM
Fig2: Custom operator code – Loop through the table and send one message out at a time
Main components of pipeline are
- Custom ABAP operator to read data from FM
- Kafka Producer
Fig 3: SAP DI graph with ABAP operator (outport of type string), Kafka producer.
Fig 4: output shown in wiretap. Each record from the table is read as one message by DI. Shown here are three messages with three timestamps.
This design worked correctly and messages are posted in kafka as expected. However, throughput was very low at about 20 msg/sec.
Our bottleneck was communication between ABAP and DI layer - Abap code ended in seconds, but it took 24 minutes to see full payload of 24K messages in DI layer.
How I improved the performance: Batch multiple JSON messages into Stream output. This enables DI to consume more data with each read
DI interprets series of strings separated by NEWLINE (\n) as stream and has operators to convert ‘String to Stream’ and from ‘Stream to String’. I did the following changes to extract a stream of data from the ABAP layer.
- Update ABAP function module to concatenate messages into one long string separated by ‘\n’. So now the function module exports a long string instead of table of messages. Take care to limit your string length if you have large payload. I limited mine to 1M chars.
- Pass this output into the ‘StringToStream’ operator, so DI sees it as a stream.
- Then connect it to ‘StreamToString’ to split the output into multiple messages for Kafka
Now, ABAP operator in DI does not have to read multiple strings from backend, but only one long output. Once data is in the DI layer, pipeline completed very fast improving throughput by 1400% from 20 msg/sec to 300 msg/sec.
Fig5: Concatenate single messages into log string separated by newline
Fig6: Custom operator reads long string and sends it to DI at one go.
Main components in this pipeline are:
- Custom ABAP operator to reads data from FM
- ‘StringToStream’ Operator
- ‘StreamToString’ Operator
- Kafka Producer
Fig 7: New pipeline to break down a message stream into individual messages before sending to Kafka
Fig8: Output from ABAP operator is one long string separated by ‘\n’. Wiretap showed only a few lines and truncated the rest of the message.
Fig 9: Output after a string is passed through ‘StreamTostring’ operator. String from fig 8 is separated into multiple messages each with its own timestamp.
Hope this helps to tune your pipelines. I would love to hear if you have used other techniques to improve your pipeline performance.