Save the (performance test) environment!

Save the (performance test) environment!

Performance testing is more important than ever. As we increasingly expose our IT systems to our customers we intertwine the performance of those systems with our brand reputation, our customer experience, and ultimately our revenue.

Good performance testing is only part of the equation. Where we run our tests dictates whether our results provide any real understanding about how our software will perform in the real world.

Production-like

In an ideal world we would performance test in production. I'll talk a bit more about where this can actually work later on, but in general this is not possible due to the following risks:

  • Compromising the security of customer data
  • Load tests impacting real customers
  • An inability to create/update records (real customer data) - so our tests do not reflect real workload
  • The time between development of code to deployment in production too long (costs too much to go back and fix efficiently)
  • Integrations between the system under test and other systems (unexpectedly impacting/bringing down another part of your organisation)

The obvious solution when faced with these risks is to built an environment which is identical to production - or as 'production-like' as possible. 

So what does it mean when I say 'production-like'? 

  • Hardware: The hardware should have the same spec (or VM configuration) as production wherever possible (particularly CPU, memory, and disks). This also includes the networks between the components, and the number of servers.
  • The application configuration should match production. This is everything from Java heap sizes and garbage collection algorithms, to database application pools, and web server caching.
  • The application version should also match production or future production (whatever you are testing).
  • The database should contain a realistic volume of data. Empty database perform faster than full ones. How much data you have depends on what you want to understand - you may need to think about how performance will change in the future as the database fills up. The test data available should also have realistic variations.
  • The external integrations are something less black and white. It depends what the scope of your testing is, and whether there is an available production-like environment for these integrations. It may be acceptable to stub these provided everyone is aware what the impact of this is on the results.
  • Any background jobs that occur in production should also occur in your performance testing environment. These jobs could impact the user experience and change the overall behaviour of your solution.

Building and maintaining such an environment is not only important for performance testing but can have a cross-purpose for security testing, functional testing, and disaster recovery

Not having a production-like environment is not necessarily the end of the road. For example, if the hardware is lower spec in your performance testing environment but the other conditions are identical it would be accurate to say that production will perform "as good or better" if you run your tests at the full workload expected in production.

The challenge of Shift Left

So what about the move towards performance testing earlier in the life-cycle? It's no secret I am not sold on the concept of "continuous performance testing" - it has potential, but there are a lot of caveats and considerations which often mean it's not worth the effort. The environments we use for this kind of testing are part of the challenge.

Say we run a component (or even integrated) test each time we deploy. What environment are we deploying to? Is it production-like? Is it integrated within itself and to external systems?

Most of the time the answer is no - we are testing in a scaled back and isolated environment. So what does our load testing really tell us? What it won't tell us is how the system will perform in the real world. The best we can hope for is to track performance over time relative to previous builds (e.g. response time, server resource consumption).

The challenge of Shift Right

If you haven't heard about it yet "Shift Right" is the opposite of Shift Left. The idea is to continually and rapidly deploy to production. We couple this with detailed application and server monitoring (e.g. APM tools) and the ability to roll back quickly. This means we are always measuring performance metrics in a real production environment so we get better value out of all our testing and monitoring.

In many ways Shift Right solves the environment issues I mentioned earlier because we are using production as a test environment and the feedback loop is fast enough to resolve issues cost effectively. There are still other considerations:

  • Can we apply synthetic load?
  • If we just use real users to load the system, can we account for peak load or future load?
  • Can we deploy new code to a sub-set of our users to minimise the impact if things go wrong?

And then there's the fundamental issue - if you are building a new system with a big bang go-live you still need to understand its performance before you go live. That requires a more traditional performance testing approach. Shift Right works best with incremental change to an existing solution.

Utopia: Infrastructure as Code

I've read about being able to spin up and collapse full production-like test environments at will. I'm yet to see this in practice, but it sounds promising.

I would love to hear your experiences in implementing IaC and whether it helped you implement more accurate performance testing with less effort.

Closing

I think there will always be challenges with production-like environments, particularly given the increasingly distributed nature of our systems which often rely on components all over the world.

As always, it's about thinking pragmatically about your situation. The bottom line is - make sure your test environments facilitate accurate performance tests which provide meaningful insight into the performance of your systems.

To view or add a comment, sign in

More articles by Stephen Townshend

  • Monitoring your Mac with Prometheus

    A few weeks ago I was exploring SquaredUp Cloud which is an dashboarding and visibility platform that lets you connect…

    6 Comments
  • Running your first Kubernetes workload in AWS with EKS

    I have been using Kubernetes for about a year and a half, but through all of that time I've only ever deployed…

  • Containerising a Node.js app

    As a Developer Advocate, I need to keep my technical skills up to date and to practice what I preach. One way I'm doing…

  • A Year as an SRE

    A bit over a year ago I transitioned from performance engineering into the world of Site Reliability Engineering (SRE).…

    7 Comments
  • The HTTP Protocol (explained)

    What's this all about? A few years ago, I started writing a book about performance engineering. I only finished a rough…

    6 Comments
  • Running Grafana & Prometheus on Docker

    We're in the process of standing up a monitoring platform on Kubernetes. Before we started this process I had very…

    11 Comments
  • Is cloud computing killing performance testing?

    I 've received a few messages recently from individuals concerned that performance testing is "on the decline". The…

    17 Comments
  • Wrapping up 13 years of performance engineering

    Thirteen years ago, I fired off my CV to a few dozen organisations looking for my first job in IT. Months later, after…

    9 Comments
  • Performance Engineer to SRE?

    Two months ago I transitioned from a performance engineer to a site reliability engineer (SRE). It's been terrifying at…

    21 Comments
  • Before you automate your performance testing…

    This year I’ve been working in a large program of work. My role is to oversee the performance testing and engineering…

    14 Comments

Others also viewed

Explore content categories