The aim of Stress testing is to identify possible failures or malfunctions of systems in situations of load and concurrent use of an application.

This document sets out the scope of stress testing carried out at a SIPSA customer site and presents a summary of the main findings identified during the tests.

Every organization should regard Performance Testing as an integral part of their Test Process. If a Business continuously fails to test the performance of its apps, software, or APIs, it runs the risk of those products crashing, and by extension the business may well crush and burn, especially given today’s Economic climate as a result of the current pandemic, with business needing to cope with the additional online demand.

The phrase ¨performance testing¨ says it all: it keeps track of how your product is behaving and how it is performing. This is important whether you have many or just a few customers.

If you only have a few customers, can you afford to let load testing go? possibly, but that is not really a good option or indeed a positive one, is it? What if one of those valuable customers, that you cannot afford to lose, suddenly increases its usage.

This will not be the only customer to crash, all your customers will crash. The reality is that your business is too important not to be carrying out load tests. No matter what you predict for traffic and usage rates, this must be a case of preparing for the worst-case scenario whilst hoping for the best outcome.

Load Testing and Performance Testing should be seen as an extension of your customer service and product design, as it should be carried out right from the embrionic stage of your product- it helps determine viability, need and to build reliable performance in the design and structure of the System until it reaches the product end of life and needs to be replaced.

It is of crucial importance that you automate testing during its lifecycle, so as to assure the quality of your code, that it can adapt and grow, that your resources can adjust to meet client and server demands whether these contract or expand.

It is widely accepted that it does not matter how feature rich an application may be if its performance is perceived by the users as being poor, the levels of rejection will be high. It could be because it crashes or suffers delays in loading. Any application should deliver acceptable performance even when withstanding peak usage loads with drops in network bandwidth and in less than optimal conditions.

Benefits of using TAST as Load Test Tool.

  1. Quality assurance of production systems.
  2. Efficiency. The only way to run load tests is through Automation, otherwise you would have to coordinate the work of many testers and that is unfeasible if it is not done automatically. All tools that do load test do it automatically, but we achieve efficiency with TAST. TAST is the tool to define Functional and System tests and by extension it is possible to use all these tests in load situations.
  3. Load Tests with TAST are done in a distributed way. This avoids security problems when attacking the systems from different IPS.
  4. Automated Load Testing is Fast. With TAST we can apply periodic and continuous load test slots, aligned with the functional modifications, since we reuse the test diagrams already created in previous phases (integration, functional and acceptance tests).

Load Test with TAST: Success Case.

The execution of the test is carried out at different timeslots based on two different scenarios:
* Concurrent Testing- Execution of Front-End navigation accessing at the same time by several users.

* Stress Test- Stress testing based on API load injection adjusting the infra used to the amount of load injected.

This test was carried out in 3 levels to review the behavior of the system in load situations only with navigation and load in situations of data processing between internal systems. We organize levels by the type of load generated in the entire system, not just the front-end.

Level 1: 100% navigation.
Level 2: 90% navigation 10% data recording and processing.
Level 3: 50% navigation 50% registration and data processing.

Based on these executions, the results chapter presents findings and response times to clearly show the cases where the system did not behave as expected but also the system response times for the different calls, in both cases: “searching/read” request and “saving/writing” request.

The findings are categorized in two ways:

* Failures: detection of errors. The severity of these findings is high.
* Increase of Response Times: comparing the times observed in the Stress test with the standard load times. In this case the severity of these findings is consider as medium or low, due to the system behaving as expected.

The execution of the Stress test has been conducted by using two main tools:

* TAST (Test Automation System Tool): tool used to automatically run End-To-End system test cases.
* AI (Application Insights): tool used to monitor the infrastructure behaviour, as well as to measure response times and debug the log of failure requests between systems.

During the session of 5 hours loading and stressing the system using the previous scenarios and load levels the following behaviours were observed:

* Frond End Gateway had non-linear performance and detected the limit in the infrastructure dimension.
* Core System, pre-production environment has not reached any infrastructure failure but observation of increase of the response time grew beyond the external system timeout.
* Interface with a middleware system, detection of the limit of request performed to 4 per minute on average.

The results of the stress and load tests require the following actions:

Correction of the issues with high severity:

  1. F1: Request to DB not working properly when concurrent access.
  2. F2: Timeout of the API to load the first page of the front-end.

Analyse infrastructure resources considering the expected load in the production environment.

Analysis and sizing of load in the production environment to avoid the findings with medium and low severity, following the next advises:

  1. To scale the deployment horizontally to increase the number of PODs supporting both front end gateway and core system.
  2. To increase the response timeout threshold of the system to avoid the rejection of requests in case of load situations in case it is accepted by the final users.

It is recommended to carry out a second stress test once the corrections and recommendations has been apply to the system to verify if the behaviours improves.

The average stress test times have been collected from AI.

Conclusions

The Tool to Automated Load Testing: TAST.

TAST runs load tests reusing the design of the E2E system tests.

The load tests in TAST allow to detect:

  1. Bad dimensioning of the system infra
  2. Increases in system response times
  3. Database management errors
  4. Limitations of the system interfaces

The execution of load tests with TAST is possible both in Waterfall environments (it is usually only done 1 time prior to the start of production) and Agile (where they are necessary with changes of new infra, great functional or architectural change).