5 recommendations for successful IT monitoring

Cristina De Luca -

July 09, 2021

Increasingly, IT infrastructure monitoring is seen as a business process. Its goal? To collect and analyze IT infrastructure data and leverage that data to improve business outcomes and drive value creation for the organization.

But those who need to monitor a large IT or network infrastructure face several challenges.

Here are some practical recommendations to help you optimize and maximize monitoring, save time and money while providing better IT services for your business.

1 – Define measuring points, limits, and alerts

Before planning your monitoring architecture, you need to understand your environment. Most importantly, know how many measuring points you have.

For whatever you want to monitor, there will be multiple measurement points. If you want to monitor the devices themselves, you will need to monitor things like device temperature, fan speed, remaining storage, CPU power, or other metrics that may be relevant.

Obviously, the more measuring points you have, the more processing and planning power is required for your monitoring concept.

2 – Segment the network

On large networks, it is not feasible to simply have potentially thousands (or even tens of thousands) of search engines across the network sending data back to a central monitoring server. Instead, you will need to logically segment your infrastructure.

3 – Create a centralized overview

Regardless of how you set up your monitoring, you probably have multiple monitoring servers collecting data from different parts of your infrastructure. Now you should put it all together so that it can help you manage all of your IT, all from one central point. The way to do this is to create dashboards with an overview of the infrastructure, so you can tell immediately if there are problems.

A dashboard is simply a way of visualizing information. It can be configured to provide operational data, give business insights or highlight anomalous events that may pose security threats.

Having a single view of IT infrastructure is vital as IT teams work to manage many moving and changing pieces. Moving to one tool helps them better assess and report on IT performance.

4 – Define response teams and set up notifications

To manage a large IT infrastructure, the IT department is usually divided into skill areas, so you have separate teams for different functions. For example, one team might be responsible for the online storefront, another team for email services, and so on. Of course, these teams would also be responsible for monitoring their respective areas.

For your monitoring concept, define user groups according to the areas they focus on. Then set failure notifications in these areas to the specific teams that need to act promptly whenever a problem is detected.

5 – If applicable, think beyond IT

With the emergence of new business cases and production processes, OT networks are becoming larger and increasingly complex. In fact, in many large companies across multiple industries, the number of IP addresses in OT is already larger than in IT. It is crucial, then, to monitor OT. The need for a reliable OT network is growing rapidly.

Obviously, an industrial network has different monitoring requirements from a regular IT network. OT teams want to know about the status of their process automation systems, such as DCS (Distributed Control Systems), hybrid systems, PLCs (Programmable Logic Controllers), MES systems such as historians, LIMS (Laboratory Inventory Management Systems), batch management systems, and other specific applications (e.g. based on OPC communication). They also need to monitor network redundancy and industrial protocols such as Profibus.

IT and OT were, until recently, very rarely interconnected infrastructures. But now, digitalization is driving convergence. Data – essential for the effective management of production processes – needs to be collected, analyzed, and utilized at all levels, from the shop floor to the plant itself. This means that devices that were once isolated – programmable logic controllers, for example – now need to connect to data collection systems.

The challenge of monitoring this new converged infrastructure is to bring multiple metrics into a single view. Essentially, an overview of traditional IT elements, OT elements such as gateway devices, and other device metrics including IoT devices.

IoT continues to transform business as evolving technology and interconnected devices generate real-time analytics. This gives organizations the chance to improve customer experience and streamline logistics demands.

According to Machina Research’s global market forecasts, by the end of 2024, almost 25 billion connections will be established, meaning more data will be generated from monitored devices, equipment, and the general environment in which they exist.

Conclusion

It’s a long-accepted truth, though not entirely comfortable, that most organizations don’t realize the importance of IT until it fails. So most workers will agree that monitoring infrastructure is a good idea. It’s like preventative medicine. Tracking and comparing uptime, bandwidth, CPU usage, capacity, and other metrics over a long period of time provides a baseline from which alerts can be set to inform IT when systems are going wrong. Ultimately, outages and crashes are very expensive, but when IT can identify them before they happen, major problems become small and easy to fix.

A properly used vendor-independent monitoring tool can help prevent a number of common IT headaches, problems that are easy to avoid but hard to fix. Here’s a sampling of the problems IT can prevent, saving time and productivity, with the help of a second pair of eyes.

The complexities that come with managing a vast digital ecosystem may seem daunting, but they don’t have to be. In fact, once system administrators get a handle on the data they need, they can take control and proactively use the data to make informed decisions quickly.