An Introduction to Observability
What is Observability?
Just like everything else in software development, the idea of observability is not new – it emerged alongside the advent of information systems. Observability is a critical part of Software Development Life Cycle and helps developers and operations teams monitor their applications and environments, identify issues before they impact customers, and improve the performance of their software products.
Observability is the art of observing and understanding a system in order to make better decisions. Observability is generally understood as the ability to observe, understand and act upon events that occur within software systems or their components.
Observability encompasses the monitoring of application metrics (usually via instrumentation), logs and exceptions, tracing data, and many other aspects of software applications. You can leverage observability to diagnose problems in real time or after they have occurred so that they don’t occur again.
The observation part is straightforward – there are tools that can collect data about what has happened inside our application and correlate those observations.
Key Benefits
❆ Gain insights into the infrastructure as a whole
❆ Promote faster releases
❆ Resolve issues easily and quickly
❆ Reduce costs
❆ Enhance developer productivity
Pillars of Observability
Metrics
Metrics provide quantitative data points about what’s happening within a system at any given point in time. This may take the form of CPU utilization or memory usage over time, counts on individual requests being served by an API gateway, etc., but they’re typically aggregated across multiple instances of the application (e.g., per cluster node). They can also include derived values such as averages or percentiles; for example: “the average CPU utilization across all nodes was 20% today.”
Recommended by LinkedIn
Logs
Logs are structured messages that provide context about what’s happening within the system. They often include information such as request IDs, timestamps, and payloads for individual requests being served by an API gateway. As with metrics, these logs can be aggregated across multiple instances of the application (e.g., per cluster node).
Traces
Traces are unstructured streams of events emitted by a software. They’re typically emitted at a high rate (e.g., thousands per second) and include data such as the time at which each event occurred, what kind of event it was (e.g., HTTP request, database query), and any additional parameters that were passed along with it (e.g., query parameters for an HTTP request).
Observability vs Monitoring
Monitoring and Observability are related concepts, they complement each other. In other words, the two terms “monitoring” and “observability” are often used interchangeably. However, there are subtle differences between the two.
The key difference here is that while monitoring is reactive (i.e., it responds after an event has occurred), observability allows us to detect problems before they occur or even know when they occur in the first place (i.e., it is proactive).
Monitoring refers to the process of collecting, storing, and analyzing data. Observability provides valuable insights into how an application behaves at runtime. So, observability provides visibility into how an application has been behaving in a production environment.
Monitoring is the act of tracking and measuring the performance of a system. This can be achieved by using tools which track application performance metrics like response times, error rates, and concurrency issues. Observability refers to the capability of observing and understanding the state of a system. With it, we can detect problems before they occur or even determine when they are likely to occur.
Observability and monitoring solutions provide a comprehensive overview of the health of your IT infrastructure, allowing for better decision-making. While monitoring warns the team of a possible problem, observability assists the team in determining and resolving the underlying cause of the problem.
Reader's Note
There are several software for observability, paid or free. I recommend starting the journey with excellent free software that is the basis for many observability solutions. For example, Prometheus, Grafana, Kiali, Jaeger, etc.
From SOFTWARE ENGINEERING DAILY'S ARTICLE: "An Introduction to Observability"
https://softwareengineeringdaily.com/2023/01/09/an-introduction-to-observability/