Bottleneck Analysis
Many Lean methodologies teach us the value of bottleneck analysis. It’s a great tool in Performance Engineering. It’s very handy in process analysis. It’s a great tool in the toolbox of many disciplines, including Enterprise Architecture.
Without getting into specific code reviews or even high-level solution architecture, it’s possible to surface potential bottlenecks, no feed and speed information necessary.
Single points of failure are not automatically bottlenecks, but in fact, they turn out to be. What drove this particular element to be architected as a single failure point? Frequently those same elements drive limited scale up and/or out.
When I look at a system, I imagine what it will look like in a test-to-failure scenario, rather than test to spec. What does it look like with 10x more users, or 100? What happens if it is deployed to a global user base. What if various parts of the code were moved to/from the cloud or an edge device like IoT? Do transients like intermittent connectivity introduce bottlenecks in resynchronization that don’t appear steady state?
Some experience with how past systems have failed can be useful. A list of past failures due to bottlenecks can be a quick cross check to an architecture – is it vulnerable to failing in the same way?
Bottleneck analysis from an architectural perspective is also very handy if we happen to get pulled out of the EA space and into some real world event. A lack of detailed data from local agents or monitoring points may make bottom-up failure analysis and mitigation difficult. But a top-down architectural view may provide quick insight into what has to break sooner or later, and the problem space for the incident narrows down to which of those happened this time.
Bottleneck analysis is often one of those quick, iterative exercises where it’s not only easy to add value almost immediately, it’s also easy to explain the problem discovered, and alternative solutions.