Performance tests (5): Environmental factors

If you are doing sports like cycling or running, you know about the importance of the environmental factors. If you are doing a running-championship under severe conditions like heavy rain, you know upfront, that it is very unlikely to get a new world record. The performance of all runners is reduced by the environmental conditions. On the other hand if you want to swim through the British Channel, it is not sufficient to train only in your indoor swimming pool. You need to train under more severe conditions.

It’s the same with performance tests. They are impacted by the environment of the systems they run on. If your test needs to copy a large amount of data to another system via the network, the performance is highly dependent on the available bandwidth and the latency between these systems. If you compare 2 runs of this test, one on a loaded network connection (another application is also copying data) and one with an unloaded connection, you’ll get a huge gap. But we run the same test case on the same systems. So the environment also influences our tests and the test results.

So we always have to consider the impact of the environmental factors. These factors could occur anywhere at any time, and we cannot influcene them directly.

  • Other applications add additional load on our systems, so we don’t have the same amount of resources available for the test execution. This is especially a problem when you have shared hardware (virtualization); although virtualization on commodity x86-64 hardware gets better and better, you still might have.
  • Regular processes like backup or TarPM optimizer might interfere with your performance test. Of course these processes have to run. But you should be prepared for them and know of them. If they run without you knowing that, the results of the tests are different than you expect and you need to research the reasons and rerun the tests. Lot of lost time.
  • The network might have reduced performance or availability due to maintenance or other tests.
  • Same for all connected backend-systems (LDAP server, shop backend, …)

Of course also severe conditions can be found on production systems as well, like a bulk data transfer slowing down the networking connections. But in most production systems such activities are much more planned than “just on a test system”, where it’s more likely that such things happen without announcement and less planning.

There are 2 ways to mitigate such problems:

  1. You need to have a list of systems and applications, which might affect your performance during the performance test. Be careful here, because each of them can create outliers on your test, which are hard to explain when you are not aware of them.
  2. Open ommunication. When you plan a series of performance tests, communicate this to all the parties maintaining systems on this list. You can resolve conflicting dates (2 parties doing tests on the same day) before you get incorrect results.

The basic approach is here: You cannot mitigate severe environmental impacts. Your production needs to deal with them as well, therefor their appearance can be a good preparation. But when you do testing, you should be able to state clearly, what is caused by your test scenario and what is caused by the environment.

A nice example how such tests can created to improve the resilience against unwanted problems in the production, is the Simian army of Netflix. They validate their infrastructure, their application and their processes by turning off systems at random (chaos monkey) or even all their systems in a datacenter or amazon AWS availability zone. Just to validate that they still can deliver their service. And, as proved by the last amazon outages it’s useful, as Netflix was less affected than other services leveraging the same amazon infrastructure. Because they already had the firedrill, the adjusted application design and the required processes to cope with such situations.