The motivation behind this blog is to give an overview about the best practices when it comes to real time job processing within Data Services. For Batch Jobs, a numerous amount of performance tuning metrics, best practices are readily available and can be implemented with the references, whereas here real-time job optimization best practices have been jotted down based on real-world experience and researching various SAP resources.
Before we get into the actual best practices, tips and tricks let’s have a brief look into the working, components, Job design, models etc. of SAP Data Services Real-time Jobs.
SAP Data Services can receive messages from ERP systems or Web applications and then can respond in no time with the data from a data cache or any other application
The Processing of data in a real time job can require the usage of a data cache. Either the real-time job can add additional data to the message coming from a data cache or the real-time job can load the message data to the data cache.
Role of an Access server in Real-time Jobs:
The time a real-time job receives a message, it’s the Access Server which routes the message to a waiting process which then performs a set of operations with respect to the message type.
When the real-time job sends back a response, again the Access Server sends back a reply to the upstream application
Please refer to the following diagram for getting a better understanding on working of Data Services Real-time Jobs
Real-time Job working
Real time Job Design:
A real-time Job may contain a single data flow, multiple data flows, work-flows, scripts, conditionals, while-loops etc.
Both single and multiple data flows can contain the following objects:
A single real-time source and target: XML message (required).
Sources and targets: Files, XML files, and tables, including SAP tables and template tables.
Transforms and queries.
In multiple data flows one additional objects can be included:
Memory tables: Which can be used as staging tables to move data to the subsequent data flow in the job.
Additionally, IDOCS can also be used to create real time jobs.
Real time Jobs Models:
Single Data Flow model:
As the name suggests with this model, we can create a real-time job using a singular data flow. It only includes a single message source and a single message target.
Single Data Flow model
Multiple data flow model
The multiple data flow model enables you to create a real-time job using more than one data flows in real-time processing.
Multiple Data Flow model
By using multiple data flows, the data in every message is processed completely in an initial data flow before it goes to processing when the subsequent data flows starts.
For example, if the data has 10 items, all 10 should be passing through the first data flow to either a staging table or a memory table before it gets passed to the subsequent data flow. This will allows you to have more control and enable you to collect all the data at any point of time.
Now, you might ask why Real-time instead of Batch Job processing?
You need a response back to the source system when data is moved to the data warehouse for carrying out next steps depending upon business use cases
There is no fixed allotted time when jobs can be scheduled as per the incoming data in 3rd Party applications
You don’t want to wait for an internal trigger or for scheduling a job and want to execute the job faster (depending upon the amount of data) as it is only waiting for a response from the dedicated access server
Now let's look into some of the best practices with real-time jobs:
SAP Data Services Real-time Job Best Practices:
1.Usage of Memory Datastore for real time jobs
Memory datastores are advantageous for processing real-time jobs that handle small amounts of data as they allow for instant access to the data.
Memory tables serve as blueprints for temporarily storing intermediate data. They cache data from hierarchical data files and relational database tables. SAP Data Services stores this cached data in memory, ensuring immediate access without the need to access the original source data.
The repository stores memory table schemas in a memory datastore, which differs from a typical database datastore that connects to an adapter, database, or an application.
Advantages of Memory Datastore:
By caching intermediate data and allowing data flows to access it from the cache instead of the remote database, memory tables can enhance job performance, especially for jobs with multiple data flows. It's recommended to use memory tables when dealing with small amounts of data to achieve the best performance.
In addition to enhancing job performance, memory tables also improve function performance in transforms. Functions such as Lookup_Ext that don't require database operations can access data from memory directly, eliminating the need to read it from a remote database.
2. Utilizing separate Job servers for real-time and Batch jobs
Whenever a web-service request is made it is forwarded to the Data Services Web service Layer with the help of HTTP/SOAP protocols and the web service routes them towards the access server. However, when a message client API's are used, the requests are sent to the access server directly which bypasses the web service layer and in return improves the performance.
It is important that Real-time Services are set up, and for each service, the job servers that will handle the requests and the number of service providers (AL_ENGINE processes) that will be created to service the requests on each job server should be defined.
Proper management of the minimum and maximum instances is necessary to achieve scalability of the service on each job server. It's important to note that DOP does not scale real-time jobs, but the number of service providers (processes) does. Therefore, Balancing the loads with memory consumption per physical deployment and the number of services supported by each job server is crucial.
It is suggested to use a separate job server (both physical/virtual) for batch and real-time processing to be certain that SLAs for real-time requests are being met.
3. Real time services Performance tuning in SAP Data Services
The number of Min/Max instances you set when creating real-time services determines how the service will scale up. Each instance creates one AL_ENGINE process on the server, with each process handling one real-time message request at a time. To optimize performance, it's important to adjust the Min/Max instances based on the number of service providers you're setting up on each job server and the number of CPU's/cores available. Starting with an increase in Min/Max instances to '2' may be a good place to start.
Monitoring the statistics in the management console, such as average response times and queue lengths, is important to ensure that requests are not piling up in queues and response times meet the required SLA. It's worth noting that the first message request through a given AL_ENGINE process may take longer than subsequent requests due to initialization, which can only occur with the first request. This may result in skewed request times in the management console.
In preparation for going live, it's crucial to have a landscape that closely resembles PROD and has a PROD-level volume, which will allow us to review performance metrics.
Keep an eye on the memory consumption on the server, as each AL_ENGINE process for real-time jobs can utilize a substantial amount of memory, even for simple jobs.
Ensure that you have enough memory for all services and the intended number of service providers (Max/Min settings) on a given job server.
For load balancing, we should have two servers, and we can increase the servers' capacity if we encounter memory or CPU problems. If necessary, we can adjust job level parameters or offload more processing to the database by examining the job design.
For Sizing of the Engines, following formula can be used:
let's assume m = Number of engines:
m*memory consumption per engine = total RAM consumption
m*CPU utilization per engine = total CPU consumption
4.Set the Array fetch size based on your network speed
We have to Consider the following factors to determine the optimal value for the array fetch size:
The size of table rows in source, (includes the number and type of columns)
Time taken for a Network round-trip database requests and responses.
If your computing environment is powerful, i.e., the machines running the Job Server, related databases, and connections are fast, try increasing the array fetch size. However, it is recommended to test the performance of your jobs to find the best setting.
It is also important to keep in mind that a higher array fetch setting will consume more processing memory proportionally to the length of the data in each row and the number of rows in each fetch.
5. General SAP Data Services Real-time Jobs Design Tips
When creating data flows for real-time jobs, it's important to follow certain guidelines.
Firstly, if you're joining a real-time source with a table, the real-time data will be included as the outer loop of the join. When joining more than one supplementary source, you can determine which table is included in the next outer-most loop using join ranks.
Secondly, avoid caching data from secondary sources in real-time jobs unless it's static as the data will be read when the job starts and not updated during runtime.
Additionally, if no rows are passed to the XML target, the job returns an empty response to the Access Server, and you should provide instructions to the user to handle this scenario.
Finally, to avoid discarding rows, structure the message source and target formats so that one "row" is equal to one message, utilizing the Nested Relational Data Model (NRDM) to structure any amount of data into a single "row" by incorporating other tables within column
Thank you for reading the blog until here. I hope the blog was informative and can be useful to anyone in need of the suggested Data Services Real-time job best practices.
Feel free to ask any questions or let me know in the comments if it helped. I'll try my best to respond.