Testing in Production

Testing in Production

Many test practitioners know that sinking feeling when running a stress test in a ‘fully isolated’ test environment, and then being informed of a critical Production Outage - which coincidently occurred when the stress test started.

It is a little like taking a walk in the bush (like I did this morning, while holidaying near Uluru), believing I was safely away from everyone, only to stumble on a sign that says “Caution: Spear and Boomerang Throwing Area”.  Or worse still, imagine admiring the detailed shapes of craters on a barren landscape before realising you are on a live fire bombing range.  Sometimes, you don’t realise you are in the ‘danger zone’ until it is too late.

While testing in production is generally frowned upon, it is often an important component of a rigorous performance testing strategy. Before a system goes live for the first time, there is usually plenty of opportunity to run performance, load and stress tests in the target ‘Production’ environment.  However, for significant upgrades, there is often no obvious opportunity to run a performance or load test in the production environment.

Load and Performance tests are generally executed within 'Test' environments that may or may not be sized, architected and configured as production.  For this reason, it is desirable to ‘calibrate’ the performance of a test environment with performance observed in the production system.    There are two main reasons for an aversion to such an approach.  The first is data leakage (or corruption), where data from testing activities leaks into production. This can be in the form of test data that persists in the production environment, or more embarrassingly in the form of data leakage out from the production system into integrated systems such as email or other downstream systems.  The second concern relates to possible adverse impacts on the production system itself.  While understandable from the perspective of ‘protecting production’, it is often a subtle sign that the production environment is perceived as not being robust or resilient and that it could be vulnerable to failure with low levels of activity.

An alternative perpective on Testing In Production comes from our view of road safety.  The only thing separating two high speed vehicles from head on collisions on a highway are the double line road markings and the road rules that enforce the safe coexistence of cars travelling towards each other at a combined speed of 200 km/h.  If one car was to cross those double lines at just the wrong time, then the consequences would probably be fatal.  However, our system of road rules actually encourage ‘Learner Drivers’ to drive in such situations, sharing the road with experienced drivers. By exposing learner drivers to real life situations, under the guidance of more experienced drivers, those learners are prepared and then able to be tested (on the ‘Production’ road network) before obtaining a licence to drive on their own.

The type of testing in production that does represent a real danger to production systems is testing where one is reassured that the test environment is isolated from production, when it does actually share critical components with production. This can be authentication systems, the database server that supports the authentication system, security and network appliances, switches, routers and even SAN storage.  It is these shared components that should give rise to a careful load and stress testing approach, where load on the Test system is initially executed under the watchful eye of relevant specialists, to ensure that such tests will not adversely impact production. 

Testing in a 'Test Environment' that shares components with the Production Environment can be a little like a person wandering into target area for spear throwing enthusiasts.  The first sign of a potential problem, like the photo in this Post, is only evident when one is already in the danger zone.  The moral of this story: know your environment.

To view or add a comment, sign in

More articles by Paul McLean

Others also viewed

Explore content categories