Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
stevang
Contributor
1,189

adi-goldstein-EUsVwEOsblE-unsplash.jpg

A system is as strong as its weakest link – this is a well-known fact.

In the integration domain we connect applications and data, in an integrated system capable to seamlessly perform business processes within different connected applications without interruption and human in the loop (popular wording in the age of AI) action. Of course, this is not an official definition – and there could be many of those (definitions), more-or-less formal.

The important thing is to understand that integrated system of applications will connect many applications and various data flows, usually using various middleware IT components.

Why Performance Testing for integrations?

We do run Unit Testing, Functional Testing or System Integration Testing on our applications or our connected applications – but is this enough?

For the new application, it is just not enough to test only if it fulfils functional requirements – the same way, for an integrated system, with several applications and one or more middleware IT component, it is not enough to test only if the integration works. We need to understand if our application or integrated system can meet specific non-functional requirements. Can it perform? And what are the limits it can sustain?

Yes, I am talking about Performance Testing, Load Testing, Stress Testing and more…

We do those things with applications, but are we following the same route with integrations and integrated systems?

We should…

But let me go first through some general intro – what are the different types of Performance Testing and what are the appropriate Testing Methodology to apply – disregarding if we talk about testing of individual applications, or integrated systems with middleware flows.

Types of Performance Testing

I am not going to go with definitions what is Unit Testing, Functional Testing or System Integration Testing, let me focus only on the family of Performance Testing.

While there are many definitions how to split Performance Testing to several distinct types[1][2][3[4], I will stick to my usual habits and stay with the traditional one from IBM[5].

  • Load Testing indicates how the system performs when operating with normal and expected loads. We are taking into consideration average numbers of concurrent users, with average system load (operations performed) – i.e. average number of concurrent users placing an order of the average size (number of items).
  • Scalability Testing is more than Load Testing and less than Stress Testing, as we want to test how the system performs when scaling to the boundary conditions, either current or expected – i.e. peek number of concurrent users (in the peek season, peek hours) placing large orders (number of items).
  • Spike Testing is creating very sudden increase of users traffic creating shape spikes in system actions – i.e. there is a sudden burst of orders coming by synchronizing offline devices; or more commonly could be caused by sudden restoration of the stopped service now sending huge number or orders at once.
  • Volume Testing differed from Spike Testing as its focus is on increase of data volumes primarily, and how the system will manage the increased incoming data volumes – i.e. load of large payloads (like orders with extensive number of items) into database tables of queues etc.
  • Endurance (or Soak) Testing differs from both Spike Testing and Volume Testing as we are not only testing the increase of traffic or data volumes, but rather how the system will manage load throughout longer period, and will there be any degradations in service – i.e. continues load of incoming payloads within several hours or more.
  • Stress Testing is pushing the system beyond its operational limits, finding its breaking point, understanding the weakest link (where it will break first) – i.e. for order taking incrementally increase the load in number of concurrent users and order size, until system “breaks”.

On top of these testing types, worth mentioning, as part of Stress Testing, we also perform Reliability Testing as well, verifying how the system will recover from the “break” situation – i.e. if the specific service goes down, we do not want to lose any messages in between.

Methodology

Testing methodology depends on the overall development approach, but the most common approaches are:

  • Waterfall follows sequential testing which occurs after full development.
  • Agile practice Iterative testing of small parts within sprints.
  • V-Model stands for Verification & Validation, where testing is linked to development phases.
  • Spiral combines best-of-both; Waterfall and Agile, making it ideal for large and complex projects; testing is risk-driven and it is integrated into each iterative cycle (spiral)[6].

However, no matter with development approach we practice, key testing goals always stay the same:

  1. Ensure solution meets functional requirements.
  2. Make sure solution also meets non-functional requirements like, overall quality, performance, and security.

Now, I have deliberately avoided saying –only functional requirements are business requirements. In fact, non-functional requirements for performance can very often be very much business relevant, as business can set same clear business driven SLAs.

  • SLA (Service Level Agreement) is a formal, often contractual promise of service quality, defining what is guaranteed level of service.
  • KPI (Key Performance Indicator) is an internal measurement of how well SLA goals are met. These measurements are usually very operational.

SLA examples

KPI examples

4s average response time

Last month we achieved 4.87s average response time

99% of orders must be received and processes within 8s

Last quarter we had 97.4% orders processes within 8s

1 000 000 orders per day without degradation of service

Yesterday we have processed 1 002 158 orders, where all SLAs are kept

What does this tell us?

By clearly understanding SLAs, we can define appropriate Performance Testing measurements (and scripts), something we want to test and what is the level of service we need to achieve with our new application or integrated system.

Please note, focus here is on Performance Testing, so I am not addressing other non-functional requirements, although we may apply similar approach for them as well (bur the actual testing or verification may be significantly different).

Doing it right…

Requirements Gathering & Planning:

For the Performance Testing in integration (or in general), the first step is, of course, gathering all non-functional requirements like SLAs indicating i.e. response time, error rate etc. This is the moment to collect all

  • defined SLAs
  • Volumes (i.e. hourly, daily etc.) including expected growth
  • Business patterns (i.e. patterns or spikes during business hours, or during season)
  • Systems under test (i.e. what are the systems, or integration flows or IT components which are under test?)
  • Users or user groups (i.e. integrations are usually built using technical users, but let this be re-confirmed)

Test Design

What do we test?

Let’s have one thing clear – when testing integration, we are testing inbound and outbound, complex multi system flows and respective endpoints of the Provider and Consumer(s). But we are not testing the actual business process within Provider and Consumer(s)– this should be covered by relevant application testing.

What types of testing?

Let us plan what types of Performance Testing we need to execute (Load Testing, Scalability Testing, Spike Testing, Volume Testing, Endurance (or Soak) Testing and/or Stress Testing) and create realistic workload models.

In realistic terms, for integrations, we may stick with only few types of testing, combining necessary testing requirements:

  • Load Testing based on specific SLAs.
  • Scalability Testing but combined with Spike Testing, Volume Testing and Endurance (or Soak) Testing (with increased user loads and data volumes, running for a reasonably longer period – i.e. 50% load more than maximum expected.
  • Stress Testing but also covering Reliability Testing (to understand the limits and recovery process)

What kind of integration are we testing?

There is a difference if we are testing Sync API or flow or Async API or flow[7][8].

Figure 1. Sync vs. AsyncFigure 1. Sync vs. Async

All clear, but how does this impact on our Test Design for the Performance Testing? Let's dig deeper... 

Figure 2. Example of Sync flow with SAP Integration Suite (CPI and API-M)Figure 2. Example of Sync flow with SAP Integration Suite (CPI and API-M)

Sync Integration Execution is a single thread – only one operation will run at a time.

Sync means Sync Request-Reply pattern. As long as Sender is waiting for a response from the Receiver (either directly or indirectly) to finalize specific operations, this is considered as Sync processing. We may even have in between some queueing with retry logic (i.e. within SAP Integration Suite CPI flow), but if Sender is waiting for the final response, this is still Sync processing.

  • When testing Sync API or flow i.e. RESTful API or OData API, no matter if we have some CPI flow or API-M in between – testing tool (i.e. JMeter) can immediately get appropriate response, either success (i.e. http200) or some error (i.e. http4xx. htt5xx).

While we may collect additional logging from the Receiver and the middleware IT component(s), this would be more relevant from the perspective of monitory and observability, not Performance Testing itself.

Figure 3. Example of Async flow with SAP Advanced Event MeshFigure 3. Example of Async flow with SAP Advanced Event Mesh

Async Integration Execution is multi-thread – multiple operation can run in parallel.

Async means decouples, and it may be PubSub pattern, or Async Request-Reply pattern. Here situation is a bit more complex, as we need to collect and compare relevant logs.

In PubSub pattern, individual IT components may or may not be set to send appropriate responses or acknowledgements (http, ACK/NACK, QoS) but those responses or acknowledgements are not (by default) propagated from the Receiver(s) back to the Sender – if set, response and acknowledgement is only an information if the next component in the flow have received the messages.

With Async Request-Reply pattern, Receiver will send backward a separate response message, for the received messages. However, this message is also sent as Async API, usually after some processing is being done in the Receiver system. Implementation of the Async Request-Reply pattern is a separate topic not covered in this article – but in general it can be completely separate Async flow, or it could be build using correlation IDs (i.e. using SAP Advanced Event Mesh and CPI[9], or Solace PubSub+ SolClient Asynchronous Callbacks[10]).

In both Async patterns:

  • When testing Async API or flow i.e. Event API – the actual success or error can be seen only in the Receiver system log, on which testing tool (i.e. JMeter) may not have direct access.
  • If we have also Async return flow, depending on the SLA set, we may have to measure success rate and response time etc. for the full round cycle.
  • Async response messages will carry either confirmation or error message back to the Sender. While error message is not an integration performance issue, the SLA may still require we measure those errors as well.

Again, we may collect additional logging from the middleware IT component(s), but this would be more relevant from the perspective of monitory and observability, not Performance Testing itself.

What is the scope?

Do we test only inbound flow and Receiver endpoint, or do we need to emulate specific processes and actions in the Sender application as well?

Figure 4. What is the scope of testing, what do we script?Figure 4. What is the scope of testing, what do we script?

Ordinarily we measure performance for inbound flow and endpoint or the Receiver. If the Receiver application is also a Provider, this gives us also the Baseline Performance of the specific Integration Service.

  • Our primary concern is to understand how the Receiver (no matter if it is Provider or Consumer) can perform – i.e. how many new/modified order requests SAP S/4HANA can handle; or how many new/modified customers being replicated SAP Commerce Cloud can handle.
  • Secondary goal is to understand how the middleware IT components in front of the Receiver are performing – i.e. API-M including all policies, or CPI flow including value mappings, or Advanced Event Mesh including microservices etc.

However, if we are introducing new Sender application, SLAs may request we perform Performance Testing on the full process starting from the Sender application.

  • We need to script within the Sender application, appropriate actions invoking APIs, and measure full response from the moment action is invoked, until specific results of the operations are recorded – i.e. order has be placed in SAP Commerce Cloud , API invoked toward SAP S/4HANA, response received, status saved and/or visualized (where in this example, SAP Commerce Cloud would not normally save order, but only visualize the status). .

Do we test all Consumers at once, or we break the flows?

Let’s also understand, depending on the specific integration flow:

  • Provider can be either Receiver (inbound endpoint receiving message) or Sender (outbound endpoint sending message).
  • The same way, Consumer can be either Receiver (inbound endpoint receiving message) or Sender (outbound endpoint sending message)
  • Finally, in each integration flow there is only one Provider, and there can be one or more Consumer(s).

The question is, in case of multiple Consumers as Receivers, do we test them all at once?

Figure 5. Test each Receiver separatelyFigure 5. Test each Receiver separately

The recommended approach is to test separately for each Receiver. This would give us clear picture of the boundaries for each individual Receiver system.

But can we still have multiple Providers?

Yes and no… In fact, it is possible to have different technical backend systems providing specific Integration Service – i.e. for order taking, we can have two or more SAP S/4HANA backed systems, each servicing different countries or regions, where routing is done in CPI or API-M; but if this is the same Integration Services provided by the same application (even though there are two or more technical systems behind) – in this case, we will consider SAP S/4HANA as one Provider for the order taking Integration Service.

Payloads

We need sample payloads, but we also need to ensure all necessary Master Data and Organizational Data exists and is appropriately configured – i.e. if we are creating orders we need to have existing SoldToParty, Product, OrdeType, SalesOrganization, PricingCondition (or Promo) etc.

But it’s not only payload itself, we also need to understand if we need to set specific attributes with API call (i.e. within http header). This also needs to be defined upfront.

In some cases, some attributes (i.e. in the http body) are triggered specific processing in the Receiver application – for order taking OrderType can invoke different standard/custom functional modules/processing in SAP S/4HANA. All these needs to be defined upfront.

Environment Setup

Most of the test environments are not sized as productive environments and for i.e. Functional Testing this is perfectly fine, but for Performance Testing this may give considerably wrong picture. General recommendation is to run Performance Testing in the test environment (or QA environment) that closely mirrors production, including all IT components and software versions.

How do we do this?

This all depends on the applications and IT components we need to configure. In some cases, it might be rather easy, while in some cases, it might be more challenging. For SAP Integration Suite (CPI or API-M), very common scenario is to have separate tenants but with the same/comparable configuration. For Azure Integration Services (i.e. Service Bus, Functions), it is fairly easy to temporarily change the licensing model of the test environment/subscription and assign it the same power as the productive environment/subscription. Similar is for most SaaS applications is general – it’s all about temporarily configuring the subscription, and if we keep the time window for Performance Testing rather narrow, this will not significantly increase the subscription costs.

But in some cases, this may not be so simple. In case of SAP Advanced Event Mesh, it all depends on the deployment strategy of the broker:

  • If we use the same tenant for both non-productive (test) and productive environments (separated by i.e. Application Domains hosting productive and non-productive Applications, retrospectively), then no action is needed for conducting Performance Testing, since test and productive environment are the same.
  • If we use separate tenants for test and productive environment with the same T-shirt size (ideal, but more expensive deployment approach), then again, no issue to proceed with Performance Testing,
  • If we use separate tenants for test and productive environment but with different T-shirt sizes (more common scenario) we can temporarily deploy non-productive Event flow in the productive environment but connect it temporarily to the test Applications (Publisher/Subscriber) endpoints. Word of caution here thought – if we use SAP Event Add-on (i.e. on our SAP S/4HANA test and production environment) please make sure you follow specific licensing guidelines to understand in which scenarios it may impact the licensing costs (pls check this article from @KStrothmann[11]).
  • Finally, if we do use separate tenants for test and productive environment, but with different T-shirt sizes, and we do not want to do any temporary deployments in the production environment – it is always possible to temporarily change T-shirt size of our test environment to match the productive environment. Here we should be very careful if it impacts some micro integrations we have, especially during downgrading back after test is done.

While for most of IPaaS or SaaS applications and IT components there is a way to (at last) temporarily configure test environment to match the productive one, in some cases it might simply not be feasible – especially for on-prem system deployments.

What do we do?

There is no golden rule – but there are some workarounds steps we could do.

#

Step

Example

1.

Measure system performance

Let’s measure performance of similar services in the test and productive environments – i.e. for SAP environment use Workload Monitor ST03/ST03N for measuring the response time distribution for various task types (like dialog, background).

2.

Measure program runtime performance (optional)

Optionally, run detailed analysis of specific programs – i.e. for SAP environment use Runtime Analysis (SE30/SAT) for ABAP programs to measure execution time of individual statements, function modules, and database calls.

3.

Measure database performance (optional)

Optionally, run trace on specific performance-related SQL activities – i.e. in SAP environment use Performance Analysis (ST05) to measure where is what time spent on which activities.

4.

Calculate productive vs test environment processing power

Use all measurements to calculate realistic processing power of your test and production environment – i.e. Workload Monitor, Runtime Analysis and Performance Analysis will give some different values showing that productive system is faster.

Example:
Workload Monitor response time in productive environment is 1.4 faster than in test;
Runtime Analysis program executes in productive environment 1.6 faster than in test;
Performance Analysis database operation performs in productive environment 1.1 faster than in test;
Extrapolate using weight factors (this is just an example): 0.6*1.4+0.2*1.6+0.2*1.1=1.38;
Final calculation says productive environment is 1.38 times more performant than test environment.

5.

Extrapolate and adjust test results on the test environment

Extrapolate all Performance Testing results obtained on the test environment with calculated factors.

Example:
If an average response time on the specific S/4HANA hosted Integration Service is 4s in the test environment, we expect it will be more 1.38 times more performant in the productive environment, or expected average response time in the productive environment is 2.9s

Script Development

Now we need to start using specific testing tools like JMeter, Azure Load Testing (also using JMeter scripts) or LoadRunner. Goal is to create scripts to simulate specific actions and interactions which we want to test.

Let’s go through inputs we have collected:

#

Input

Example

1.

SLAs

For the order taking API:
Average order response time is up to 4s;
99% of orders are created up to 8s;
This is valid for any Customer, any SalesOrganization, default OrderType, standard on-invoice PricingCondition;

2.

Volumes

Annual average 160 000 orders per working day;
Maximum (peek) season 270 000 orders per working day;
Expected annual growth 10%;
Average order has 10 items, where orders normally do not contain more than 50 items;

3.

Business patterns

80% or orders are created doing extended working hours from 10:00-22:00, out of which half is created in the evening 19:00-21:00

4.

Systems under test

S/4HANA API_SALES_ORDER_SRV Sales Order (A2X), single cluster, no policy routing;
API-M SalesOrder with policies, excluding CSRF token;

5.

Users

No business user. testing integration only,

Based on inputs we will build appropriate Load Testing script:

#

Script

Example

1.

Target
Test Results

Response time percentile 50 should be below 4s;
Response time percentile 99 should be below 8s;

2.

Capturing Test Results

Catch the information about sent messages from the Sender side (i.e. testing tool): number of messages sent, start time (sending) and stop time (sending);

In case of Async API, catch the overall status on the Receiver system logs on successfully received/processed: number of messages received, overall timing from start to end, and catch response status as well if response/acknowledgement is enabled;

3.

Number of Threads

We count for maximum daily volume + 5 years growth + 50% margin:
270 000*1.1*1.1*1.1*1.1*1.1*1.5 = 652 257 orders per peek day;
But this volume is not evenly distributed through 24h, 40% is in 2h only:
652 257*0.4 / 2 = 130 451 orders in peek hour;
Or this is 130 451 / 3600 = 36 orders per second;

As we have already included safety margin, we are okay to set:
Number of Threads = 36;

4.

Rump-up period

We can use 4s as this is desired average response time, but we will use default 1s for all tests;

5.

Loop Count

For Load Testing there is no need to loop payloads more than 20-50 times;  

6.

Payloads

Create payloads:
ideally using different Customer(s),
ideally covering all (or majority) of SalesOrganization(s),
where each will use default OrderType,
where each will use only standard PricingCondition(s);

Distribute number of items in payloads:
80% average or around average number i.e. 10 items,
5% lower boundaries i.e. between 1-5 items,
10% upper boundaries i.e. between 15-35 items,
5% upper extreme i.e. between 40-50 items;

In total we may have 20-50 payloads or more;

7.

Endpoint

API-M SalesOrder endpoint

7.

Users

No business user;
JMeter will authenticate and obtain key as a client application, through VPN tunnel;

How does this look like in practice, and what does this mean?

Figure 6. JMeter configuration exampleFigure 6. JMeter configuration example

In this example, I use JMeter[12] as a testing tool of choice:

  1. Number of Threads simulates number of concurrent users or concurrent requests at the same time. Obviously, the higher the number, the bigger the load/spike is.
  2. Rump-up period simulates how often we send next batch of requests. Default value of 1s is already very high.
  3. Loop Count will randomly take payloads we have prepared and loop them indicated number of times. Obviously, the higher the number, the longer the soak is.
  4. Payloads are sample messages, and generally they will never be the same. Depending on the specific API and business process, but in most of the cases, the bigger is the payloads (more items, or more segments), the higher the data volume is.

So, for combined Scalability Testing, we may adjust the script and just increase Number of Threads (to simulate spikes), increase percentage of payloads with extreme number of items (to simulate data volumes), and increase Loop Count (to simulate soak).

However, for combined Stress Testing we should gradually increase Number of Threads, while keeping all parameters steady (as in Load Testing) – to see when it breaks (errors, what kind of errors). The second test would be to gradually increase number of items in the payload, while keeping all other parameters steady (as in Load Testing) – to see when it breaks (errors, what kind of errors). Further investigation on errors and system behavior is needed to verify integration reliability, but this will depend very much on the specific integration flow – i.e. Async flows should normally be decoupled with queues and built-in retry resilience, while Sync flow normally receive error response and application/user decide next action

Test Execution

We have designed and created scripts for Load Testing, combined Scalability Testing and combined Stress Testing.

As we are simulating real time scenarios, tests should be performed following realistic conditions:

#

Condition

Example

1.

Applications  

No other users should execute the same integration flow which is under test;
All other (background) jobs and operations should stay as-is (keep it normal as-is);

2.

IT components

No other users should execute the same integration flow which is under test;
All other (background) jobs and operations should stay as-is (keep it normal as-is)

3.

Execution timetable

Tests should respect the business pattern of operations.
Why? Because during different days or different hours within the day there might be other (background) jobs or operations impacting overall system performance, and we want to simulate all operations as realistic as possible.

We have three distinct business patterns, and we should run all tests during each business pattern:
Business day, non-working hours 22:00 -10:00 next morning,
Business day, normal working hours 10:00-19:00 or 21:00-22:00,
Business day, peek working hours 19:00-21:00;

Test Results

After execution of all tests, we have to conduct appropriate evaluation of results:

#

Evaluation

Example

1.

Load Testing

As per SLAs evaluate actual percentile 50 and 99 for all Test Executions we did (and we have at least 3 runs, one for each business pattern);

2.

Scalability Testing

Analyze percentile 50 and 99 for all Test Executions we did (and there could be many runs);
The observations will be used to define the behavior pattern of the overall integration flow i.e.:
if we increase Number of Threads 100%, response time will increase 40%,
if we orders size for 50%, response time will increase 30%,
if we soak for 2h, aggregated response time will increase 10%

3.

Stress Testing

Monitor Test Executions we did (and there could be many runs);
This will help us define the behavior pattern of the overall integration flow;

The observations will be used to define the boundaries of the integration flow i.e.:
system cannot sustain more than 1000 concurrent requests of average size,
or system cannot sustain more than 150 items in the order;

For percentiles, we can use aggregated Test Results report.

Figure 7. Percentiles exampleFigure 7. Percentiles example

This example graph shows that average response time, or percentile 50, is around 3s, while percentile 99 is around 4.7s.

JMeter may provide number of possibilities to calculate percentiles form the aggregate reports, or we may simply go for some of the add-on graph reports and include it in the Test Plan[13].

Next steps…

Remediations?

Of course, most likely your first round of Performance Testing will not provide fully satisfactory results. The next steps are mostly in identifying what optimization potentials there are, work on it, and then re-run the tests. The good this is – all the scripts are already there, so there is no need to re-do all from scratch.

However, if (due to whatever reason) SLAs are re-negotiated and changed – in that case, scripts will also have to be adjusted.

Conclusions

Why am I writing this article?

While most of the Project Management and Test Management routines are very much highly regulated, project teams might be (often?) facing lack of some specific guidelines how to test integrations, especially its performance – as integrations are, lets be honest, rather specific area…

Well, this is at least my view…

In this article, I have used examples with SAP S/4HANA, SAP Integration Suite (CPI and API-M), SAP Advanced Event Mesh and JMeter – but principles are basically the same, no matter if we use SAP or non-SAP applications and IT component. 

Anyway, as already indicated – there is no golden rule – this is just one possible approach to organize our Performance Testing for integration. Of course, this is not a rule book, and things should be adjusted to the specific needs. As always, this is just a potential guideline – nothing is carved in stone

Acknowledgment

*) Intro photo by Adi Goldstein on Unsplash

**) This article uses SAP Business Technology Platform Solution Diagrams & Icons as per SAP Terms of Use governing the use of these SAP Materials (please note, newer version of the Solution Diagrams & Icons, as well as Terms of Use, might be in place after the publication of this article).

More guidelines on Solution Diagrams & Icons can be found in this article by Bertram Ganz.

References

[1] Queue IT: https://queue-it.com/blog/types-of-performance-testing/

[2] Microsoft Learn: https://microsoft.github.io/code-with-engineering-playbook/automated-testing/performance-testing/

[3 Microsoft Learn: https://learn.microsoft.com/en-us/azure/well-architected/performance-efficiency/performance-test

[4] JMeter: https://www.f22labs.com/blogs/mastering-performance-testing-with-jmeter-a-comprehensive-guide/

[5] IBM: https://www.ibm.com/think/topics/performance-testing

[6] Spiral model: https://en.wikipedia.org/wiki/Spiral_model

[7] How to build an Integration Architecture for the Intelligent Enterprise: Part 1

[8] How to build an Integration Architecture for the Intelligent Enterprise: Part 2

[9] SAP AEM Async Request-Reply: https://community.sap.com/t5/technology-blog-posts-by-members/implement-request-reply-integration-pa...

[10] Solace PubSub+ Async Request-Reply: https://tutorials.solace.dev/c/request-reply/

[11] SAP Event Add-on: https://community.sap.com/t5/technology-blog-posts-by-sap/cheaper-than-you-think-the-commercial-mode...

[12] Apache JMeter: https://jmeter.apache.org/

[13] Apache JMeter Test Plan: https://jmeter.apache.org/usermanual/build-test-plan.html

 

3 Comments