AI is already starting to change Observability

3d rendering robot working with hud display
Cristina De Luca -

August 22, 2024

In the rapidly evolving landscape of digital networks, the demand for advanced network observability solutions has never been greater. As organisations navigate the complexities brought on by diverse architectures, dynamic workloads, remote working models and growing security threats, the need for network observability tools becomes even more fundamental. So much so that recent forecasts suggest that the observability market will grow from $2.4 billion in 2023 to $4.1 billion by 2028, with much of this growth attributed to the growing ubiquity of advanced technologies such as AI, Machine Learning and real-time data analysis. Gartner predicts a $9 billion AI observability market.

The standard approach to infrastructure monitoring involves collecting data from different sources, such as servers, networks and applications, and analysing this data to gain insights into the performance and health of the infrastructure and send alerts in the event of a problem. However, this is no longer the best approach in today’s complex and fast-paced IT environments.

Modern, secure network architectures have created blind spots in network performance, making it difficult to observe and troubleshoot effectively. The proliferation of essential SaaS applications and remote working have made it difficult to monitor and troubleshoot end users’ digital experience. Computing resources now extend beyond data centres, and hypervirtualisation and container dynamics increase the complexity of network monitoring.

In addition, to improve AI results, comprehensive instrumentation is essential, usually achieved by means of agents that collect observability data autonomously. However, managing an ever-increasing number of agents becomes cumbersome and expensive, affecting system performance and the user experience with manual updates, complex configurations and possible collisions.

And here, AI itself (usually in the form of AIOPs) is being touted as the basis for the next generation of observability tools. With AI-based insights, organisations can take advantage of intelligent alerts that can detect and diagnose problems in real time, along with their possible solutions, allowing IT teams to respond quickly and avoid further disruption.

In July 2023, ManageEngine announced that it had added OpenAI observability to its Site24x7 cloud-based observability platform. Splunk announced new AI features in its ‘unified security and observability platform’, also last year, CRN reported. In May this year, New Relic launched Grok, a Generative AI observability assistant, and Riverbed announced its open, AI-powered observability platform ‘aimed at filling in the blind spots that exist in complex IT environments that include public cloud and remote working environments, as well as Zero Trust and SD-WAN architectures’, according to CRN.

GigaOm Radar examines 20 of the main network observability solutions on the market and compares the offers in terms of features (including emerging AIOps features and integrations with LLMs).

Observability GigaOm Radar 2024

Why is this important?

AI-driven analysis of network observability provides intelligence and enables proactive incident detection, automated remediation and continuous improvement. Examples of AI/ML applications include:

  • Rapid problem detection and precise root cause analysis by ingesting and correlating data from various sources.
  • Ingesting and processing large volumes of network data in real time to analyse, interpret and standardise data formats from various monitoring tools and sources, ensuring compatibility and consistency between integrated data sets.
  • Analysing ingested data, identifying correlations, patterns and anomalies that may indicate network performance problems or security threats.
  • Dynamic visualisation and presentation of actionable insights with the right context to facilitate proactive monitoring and analysis, allowing organisations to anticipate potential problems before they affect the end-user experience.
  • Greater precision, based on learning from historical patterns to adapt to new threats.
  • Transformation of static and strictly defined automation into intelligent automation that models human logic and decision-making.

All of this takes us from simply knowing ‘what’ is happening in our systems to understanding ‘why’. If observability already seeks to provide detailed data and context, when combined with AI’s predictive analysis it now allows IT systems to predict and avoid problems before they occur. This is why more and more companies feel compelled to incorporate observability and AI into their IT operations, considering the current and future needs of their IT infrastructure.

Observability, AI and AIOps work to eliminate complexity and noise, collect, normalise and reconcile different types of data, understand services and their relationships, and use AI to proactively uncover and solve problems. Most importantly, however, observability and AIOps take the company to the next generation of productivity: autonomous IT operation. The broad feature set of observability and AIOps reduces the operational burden on managers and staff, freeing them up to focus on higher-value activities.

It’s important to start the journey towards more autonomous operations as soon as possible.