A Performance Mindset
I’ve recently found myself on the wrong side of conversations about the importance of avoiding unnecessary software-performance delays.
Just the other day I had a discussion with colleagues I highly respect about whether or not 20 ms matters. My unequivocal answer to that was, “yes”. After the laughing stopped, I decided to write this post.
I’m not saying that we should never add 20 ms or even much more than that to an interface as a tradeoff against features/complexity/development-time/etc., but I’m saying that you should never add 20 ms to an interface because “it doesn’t matter” or “we don’t know if it matters.” Another thing I don’t like to hear is “we’ll fix it later if it becomes a problem,” implying that there was no value in avoiding this latency in the first place.
This type of mindset treats software performance as a YAGN, and IMO there is always value in better performance. Higher-performance components leave more options open.
According to “perceived performance”, our society is moving ever-faster toward an ever-higher expectation of instant-gratification. “The ability to multi-task has also shortened our attention span as we’ll tend to switch to something else if a web page is taking too long to load.” So what’s too long? Obviously it depends on the task and whether or not you have a choice….
Also obvious, I hope, is that software customers should not be in a position to ask themselves that question: Should I go somewhere else / can I find a faster way to work? It’s well-established that a majority of people will start to feel that the system is delayed when they have to wait 100 ms. In fact, people will already make judgements about your site within the first 50 ms, or maybe just 13 ms. Even if your customers finish their transactions, their loyalty is inversely proportional to page latency [1][2]. “Slower user experience affects long term behavior.” So, how long will they keep coming back for more… waiting… even if a competitor feels faster? Your customers will compare you to your competitors, even if subconsciously. Search engines will rank you explicitly. In nature, only the swift (or very clever) survive. To provide the best user experience, and not have to worry about competitors beating you on speed, you’ve got to stay under 100 ms. I switched from PC to mac, and paid 3x as much for five laptops at home principally because I got tired of my PC’s slowing down over time.
At this point, I hope you can see that every 20 ms does matter. However, let me continue:
One way to look at it is: In the future, I will want to do X operations for my user, where X will depend upon the set of features we’re trying to deliver to the user and other stakeholders, such as metrics systems, advertisers, etc. To accomplish those X operations, I will want to use Y microservices, due to the well-documented benefits of highly-componentized code. The number of those services will be in the dozens, but let’s assume that I can parallelize most of it without too much effort / complexity, etc. Still, even if I do a good job parallelizing service calls, I’m going to be left with an execution graph that is some number of services deep, i.e. executing serially. I’ve seen that the execution graph for a medium complexity web page is usually at least four services deep. Only the fastest networked services respond in under 20 ms. So, conservatively that’s 4x20 ms for me to ready the first byte, plus 20 ms to send it back over the wire to the user, at a minimum. If I add an “insignificant” [sic] 20 ms to each service, now we’re already at 180 ms, which is already 80% slower than optimal from a user-experience POV.
There are also aggregate effects which convert latency to throughput constraints. Extra latency increases the number of connections that have to remain open for all downstream dependencies, including the client. So a 20 ms delay in my service can quickly multiply into 100 ms of delay once we factor in a chain of downstream dependencies. These connections usually come with significant memory footprint overhead for processes or threads that are waiting on those connections (assuming most clients are not running asynchronous stacks.) Multiply this delay times hundreds or thousands of requests per second, and it can quickly add up. Systems usually have a bottleneck that can’t scale, and we don’t want to use up any more “headroom” than we have to. But, even if your entire system scales perfectly, imagine adding just 20 ms to a 200 ms service. Now you have to scale your hardware by 10%. That might be fine by you, but it’s certainly not insignificant. And that’s for a very small delay. The bigger problem is the mentality which leads to accumulated 20 ms here and 20 ms there or maybe more, until the system is just barely tolerable, leading to a massive refactoring effort, or a shopping trip for better software.
While researching this post, I found an article I really like by one of the early engineers at Apple, Randall Hyde. It's called "The Fallacy of Premature Optimization". I hope you'll enjoy it as much as I did!