This is the first of a series of articles regarding automated testing. To start, I'd like to discuss the different ways engineers can stress test an app.

Wikipedia defines stress testing as:

A form of deliberately intense or thorough testing is used to determine the stability of a given system, critical infrastructure, or entity. It involves testing beyond standard operational capacity, often to a breaking point, to observe the results. Reasons can include:

  • to determine breaking points or safe usage limits
  • to confirm the mathematical model is accurate enough in predicting breaking points or safe usage limits
  • to confirm intended specifications are being met
  • to determine modes of failure (how exactly a system fails)
  • to test stable operation of a part or system outside standard usage

Stress tests were re-popularized in the U.S. during the 2008 financial meltdown when everyone from public officials to the U.S. Treasury Secretary was trying to save the economy from collapse. A more relevant example was the failed launch of www.HealthCare.gov when 250k users visited the website within 2 hours, causing the website to crash. Although many more issues revealed themselves over time, better stress tests could have identified that the website could not even handle 1% of the U.S. Population.

Below are various stress tests you can apply before launching your new app or a new feature. To manager further, understand what to focus on, I've included a few noteworthy statistics.

Smoke Load Test

Description: Test your UI test and sever works as expected before running complex load tests.

Noteworthy Statistics

  • Time Taken - Average
  • Assertion Failures - Per Second

Baseline

Description: Test your server performance under the expected load and create a baseline for later load tests. Establishing benchmarks is helpful for new project planning, new feature work/experimentation, operational readiness, costing, and monitoring.

Example Use Case: The use case I find most helpful is establishing alerts and monitors to detect anomalies and other unforeseen issues. For example, suppose the load balancer on your server fails, and your team has yet to implement fault automatic recovery for high availability. Introducing a monitor that detects changes in network traffic above a certain threshold (i.e., ~10% spike or dip) will alert your team to react quickly and minimize impact.

Noteworthy Statistics

  • Transactions Per Second
  • Time Taken - Average
  • Time Taken - The 95th percentile (P95)

Peak

Description: Test your server under a heavier-than-expected load. See how the server performance under a suddenly increased load.

Noteworthy Statistics

  • Time Taken - Max
  • Time Taken - Average
  • Assertion Failures - Per Second

Stress

Description: Find the server's crash point: the number of errors, application not responding (ANR's) when the response time falls below the service-level agreement.

Noteworthy Statistics

  • Time Taken - Average
  • Assertion Failures - Per Second

Soak

Description: Test the performance of your server under a load for a long time. Find memory leads and other issues that occur over time.

Noteworthy Statistics

  • Transactions Per Second
  • Time Taken - Median
  • Time Taken - 95th Percentile (P95)

Spike

Description: Test your server under sudden spikes in load and see how the server recovers.

Noteworthy Statistics

  • Time Taken - Average
  • Time Taken - Max
  • Assertion Failures - Per Second

Tools

  • Pingdom - Pingdom is a low-cost tool for staying on top of network outages and assess them later in a timely manner. It also provides a waterfall view of the test that clearly shows what elements are causing the page to load slowly.
  • Apache Benchmark -  Apache's benchmarks are unique helpful because they provide some statistical analysis. They offer a few parameters such as standard deviation, mean and median.