Is observability the future of monitoring?

The old maxim “you can’t manage what you can’t measure” has never been more true for Information Technology than it is today, with the growing adoption of microservices, containers, and the distributed and serverless models. Visibility of services and infrastructure has become indispensable for the performance of the systems. In response to this scenario, the industry is increasingly shifting from conventional monitoring tools to observation capabilities.

Observability provides not only high-level overviews of the integrity of systems but also highly granular insights into the failure modes implied by them. If you’re running a completely serverless application in some public cloud, not only do you not care about the infrastructure behind it, you can’t monitor it, even if you want to. There’s no way to access the metrics of the network or servers or containers that are running your code. In this case, what you want to monitor is the performance of your own code.

Monitoring is set up in advance, which means teams need to know what to worry about before a problem occurs in the system. The observation capability allows you to find out what’s important by observing how the system actually behaves over time, explains Ben Evans, JVM engineer, and architect at New Relic, in an article to InfoWorld.

For many people, observability will sound just like a convenient rebranding of application monitoring, and any skepticism around the industry buzzword is justified. However, as InfoWorld columnist David Linthicum says, there is a basic difference: Monitoring “is something you do (a verb); observability is an attribute of a system (a noun).”

But some also consider observability a super-set of monitoring. It provides not only high-level overviews of system integrity but also highly granular insights into implicit system failure modes. Moreover, an observable system provides ample context about its inner workings, unlocking the ability to uncover deeper systemic problems.

Alongside what we think of as traditional monitoring, observability includes metrics, logs, and traces (its three pillars) to be able to answer any arbitrary question, at any point in time, about what’s happening inside a complex software system just by observing the operation of that system. Monitoring can tell you if your system is working, but observability helps you delve deeper into why?

So, buzzword or not, observability is already a fundamental aspect of IT.

In many ways, observability goes hand in hand with APM. You can’t have observability without APM. Making a system observable presupposes the implementation of a robust application performance monitoring strategy. And that strategy provides some of the crucial mechanisms by which the state of the system can be inferred.

Increased observation capabilities help DevOps teams navigate the complexities that come with increased fragmentation in distributed systems. To achieve this, organizations must implement processes that support complete and effective application performance monitoring, distributed tracing, and effective log management. This ensures that development and operations personnel have everything they need when system stability is threatened due to incidents in an application’s resources or supporting infrastructure.

APM is almost always performed with the help of software such as Dynatrace, Paessler PRTG, SolarWinds Server & Application Monitor, Uptime, and Broadcom DX Application Performance Management, among others. Application performance monitoring involves tracking system metrics and producing visualizations designed to provide DevOps teams with important data about system performance. This provides vital context to identify the source of application problems as quickly and efficiently as possible.

Observability, on the other hand, is really more of an attribute than a process. A system is considered observable if its state can be easily determined without additional implementations. In this light, APM represents a part of the tools and processes that are needed to make a system observable. Therefore, although the concepts are different by definition, we repeat: you cannot have observability without APM.

Therefore, observability complements but does not replace monitoring. And it should not be the goal in itself, but seen as a means to build and operate more reliable software for customers. “Being able to debug and diagnose production issues quickly not only contributes to an optimal end-user experience but also paves the way for humane and sustainable operability of a service,” says QCom’s analyst Cindy Sridharan.

The holy grail of observability is the automation of part of the fact-finding process to the point where problems are automatically identified and can be fixed before they affect users. Even better: where the software itself is able to fix failures before developers are even aware of the problem in their dashboards.