IT Monitoring and Fault Management

Operations & Automation

Many small to mid-sized companies invest heavily in differentiated business services that allow them to compete against larger rivals. But in many cases, not as much consideration goes into planning how to monitor the IT infrastructure that supports those services.

To meet the challenge, IT organizations must adopt a new, proactive approach to manage the complex and highly dynamic infrastructure. This new approach requires the capability to gather, monitor and analyze data to make both short-term and long-term predictions about the IT environment, and to move proactively based on these predictions.

Fault Management is the ability to locate faults, determine their cause and set appropriate corrections. This entails identifying and finding problems or irregularities within the whole gamut of the infrastructure, pinpointing the origin of the setbacks and troubleshooting the problem elements at their source and immediate location. On an even higher plane, it also espouses the golden rule of preventio, a proactive approach of stopping negative conditions in their tracks even before they become real glitches in the system.

Properly implemented, Fault Management can keep an infrastructure running at an optimum level, provide a measure of fault tolerance and minimize downtime. A set of functions or applications designed specifically to this purpose is called a fault-management platform.


The main functions of Fault Management include:

Definition of thresholds for potential failure conditions

Constant monitoring of system status and usage levels

Continuous scanning for threats, such as viruses and trojans

General diagnostics

Remote control of system elements including workstations and servers from a single location

Alarms that notify administrators and users of impending and actual malfunctions

Tracing the locations of potential and actual malfunctions

Automatic correction of potential problem-causing conditions

Automatic resolution of actual malfunctions

Detailed logging of system status and actions taken.



Implementing an IT Monitoring and Fault Management solution provides the IT Organization with a capacity to reach several benefits:



Proactive approach instead of reactive

Detects fault before business service incidents occurs

Improved productivity by detecting and solve non critical faults

Improved reliability by identifying and proactive correcting small problems

Ease on the identification of trends of errors

Accelerate problem resolution and boost issue prevention.

Improve service levels by incidents prevention

Enhance IT's enablement of business objectives

Corporate Productivity Improvements

Facilitates automated responses to alarms

Very fast time to value for IT infrastructure and application monitoring.