Guide Incremental Change with Observability
Most businesses are keen to evolve their digital products at a fast pace so that they can provide better digital experiences to their customers and stay ahead of their competitors. They initiate digital transformation projects to change the way they build and deliver software.
As product teams move away from waterfall development practices to iterate on products faster, legacy governance models tend to affect the velocity at which products get delivered.
I do not mean to say that there is no room for governance in the Agile world. It is quite the opposite.
In the waterfall world, the governance model is static and is mostly performed before and after development - pre-defined contracts via architecture diagrams, dev-ops tasks post product development, architecture reviews.
In the agile software development world, long term planning is impossible since things constantly change in unexpected ways. Just as the software architecture evolves to meet business demands, the governance model needs to evolve as well. Agile Governance should be automated and baked into the software development process from "day 1".
In the book "Building Evolutionary Architectures" by Rebecca Parsons, Pat Kua and Neal Ford, they define Evolutionary architecture as follows:
"An evolutionary architecture supports guided incremental change across multiple dimensions."
They discuss a collection of "-ities" (reliability, scalability etc) that define the software architecture and how to measure them automatically with architecture fitness functions.
In this blog post, I will share some insights into how Observability can help in visualising these "-ities" and guide incremental change in software architecture.
What is Observability?
Observability is defined as the ability to understand and explain the internal state of a system by looking at external outputs (the “-ities”). It is not a technology; it is a mindset.
Logs, Metrics, Application Performance MonitorAPM and Uptime are the pillars of Observability. They are the external outputs that help understand everything about the system.
- Logs are a trail of events that happened at a given time.
- Metrics define the system parameters at periodic intervals.
- APM explains the end-to-end story of the user interaction with the product; it pulls together the logs, metrics and uptime data to paint the full picture of what happens with every user interaction.
- Uptime monitors the health of the system - whether the system is up or down.
Guide Incremental Change with Observability
Product Roadmap:
Observability enables product owners to understand real user behaviour. It provides insights into how easy/hard it is to access the product; the most commonly used features; user demographics - location, device type etc. These insights enable product owners to drive their roadmap based on real data.
The image above shows the User Experience view in Elastic Observability. The data is visualised in the context of Google Core Web Vitals — metrics that score three key areas of user experience: loading performance, visual stability, and interactivity. These Core Web Vitals are set to become the main performance measurement in Google ranking factors, which make the product discoverable on Google.
Non-functional enhancements:
Whilst it is enticing to keep shipping new features to users, it is critical to watch how the system is performing in the real world. Adding new features on a software application which has a baggage of technical debt is like removing a block from unstable jenga. You'd hope that the blocks do not collapse and go for it with fingers crossed.
Observability provides visibility into the performance of the software application and infrastructure. It enables product teams to proactively address tech debts and performance bottlenecks before it affects end-users.
Distributed tracing (which is a part of APM) provides a breakdown of the function calls, service calls, database queries, queue writes made as a part of a transaction. It comes in handy when studying why a certain transaction is slow.
Guided forecasts:
Machine Learning helps extract insights, anomalies and forecast trends from historical Observability data. From the business perspective, this helps in setting realistic revenue forecasts. From the technical perspective, this helps in scaling resources based on predicted demand.
The yellow line in the chart represents the predicted data values. The shaded yellow area represents the bounds for the predicted values, which also gives an indication of the confidence of the predictions. Typically the farther into the future that you forecast, the lower the confidence level.
Summary
Observability is a great way of automating governance models, especially in large enterprises. Embedding Observability from day 1 enables teams to measure and keep track of the "-ities" of the software architecture from the beginning. It enables teams to proactively address bottlenecks and issues before it affects the end users in production. When there is an issue in production, it reduces time taken to resolve these issues.
If you can’t measure it, you can’t improve it. – Peter Drucker
Nice post! I didn’t realize you had made the jump to Elastic... nice! I’ve been going deep on observability recently in my new role. The idea of observability/O11y driven development as a first class citizen in software development, worthy of as much consideration as TDD or CI is really resonating with me as we work to optimize and enhance an existing system that’s evolved over a number of years.
Nice one! You might want to read this one Suganthi https://lethain.com/guiding-broad-change-with-metrics/ It's about taking metrics a step further by comparing against a baseline and using that to drive organizational change without a top-down push. Agile + Governance.