Using adaptive monitoring in Nagios

Nagios provides a very powerful feature called adaptive monitoring that allows the modification of various check-related parameters on the fly. This is done by sending a command to the Nagios external command pipe.

The first thing that can be changed on the fly is the command to be executed by Nagios, along with the attributes that will be passed to it-an equivalent of the check_command directive in the object definition. In order to do that we can use the CHANGE_HOST_CHECK_COMMAND or CHANGE_SVC_CHECK_COMMAND command. These require the hostname, or the hostname and service description, and the check command as arguments.

This can be used to actually change how hosts or services are checked, or to only modify parameters that are passed to the check commands-for example, a check for ping latency can be modified based on whether a primary or a backup connection is used. An example to change a check command of a service, which changes the command and its specified parameters, is as follows:

[1206096000] CHANGE_SVC_CHECK_COMMAND;linux1;PING;check_ping!500.0,50%

A similar possibility is to change the custom variables that are used later in a check command. An example where the following command and service are used is:

  define command 
    command_name             check-ping 
    command_line             $USER1$/check_ping -H $HOSTADDRESS$ 
                             -p $_SERVICEPACKETS$

                             -w $_SERVICEWARNING$ 
                             -c $_SERVICECRITICAL$ 
  define service 
    host_name                linux2 
    service_description      PING 
    use                      ping 
    check_command            check-ping 
    _PACKETS                 5

    _WARNING                 100.0,40%

    _CRITICAL                300.0,60% 

This example is very similar to the one we saw earlier. The main benefit is that parameters can be set independently—for example, one event handler might modify the number of packets to send while another one can modify the warning and/or critical state limits.

The following is an example to modify the warning level for the ping service on a linux1 host:

[1206096000] CHANGE_CUSTOM_SVC_VAR;linux1;PING;_WARNING;500.0,50% 

It is also possible to modify event handlers on the fly. This can be used to enable or disable scripts that try to resolve a problem. To do this, you need to use the CHANGE_HOST_EVENT_HANDLER and CHANGE_SVC_EVENT_HANDLER commands.

In order to set an event handler command for the Apache2 service mentioned previously in this section, you need to send the following command:

[1206096000] CHANGE_SVC_EVENT_HANDLER;localhost;webserver; restart-apache2

Please note that setting an empty event handler disables any previous event handlers for this host or service. The same comment also applies for modifying the check command definition. In case you are modifying commands or event handlers, please make sure that the corresponding command definitions actually exist; otherwise, Nagios might reject your modifications.

Another feature that you can use to fine-tune the execution of checks is the ability to modify the time period during which a check should be performed. This is done with the CHANGE_HOST_CHECK_TIMEPERIOD and CHANGE_SVC_CHECK_TIMEPERIOD commands. Similar to the previous commands, these accept the host, or host and service names, and the new time period to be set. See the following example:

[1206096000] CHANGE_SVC_CHECK_TIMEPERIOD;localhost;webserver; workinghours

As is the case with command names, you need to make sure that the time period you are requesting to be set exists in the Nagios configuration. Otherwise, Nagios will ignore this command and leave the current check time period.

Nagios also allows modifying intervals between checks—both for the normal checks, and retrying during soft states. This is done through the CHANGE_NORMAL_HOST_CHECK_, CHANGE_RETRY_HOST_CHECK_INTERVAL, CHANGE_NORMAL_SVC_CHECK_INTERVAL,and CHANGE_RETRY_SVC_CHECK_INTERVAL commands. All of these commands require passing the host, or the host and service names, as well as the intervals that should be set.

A typical example of when intervals would be modified on the fly is when the priority of a host or service relies on other parameters in your network. An example might be a failover server—which will only be run if the primary server is down.

Making sure that the host and all of the services on it are working properly is very important before actually performing scheduled backups. During idle time, its priority might be much lower. Another issue might be that monitoring the failover server should be performed more often in case the primary server fails.

An example to modify the normal interval for a host to every 15 minutes is as follows:

[1206096000] CHANGE_NORMAL_HOST_CHECK_INTERVAL;backupserver;15 

There is also the possibility to modify how many checks need to be performed before a state is considered to be hard. The commands for this are CHANGE_MAX_HOST_CHECK_ATTEMPTS and CHANGE_MAX_SVC_CHECK_ATTEMPTS

The following is an example command to modify max retries for a host to 5:

[1206096000] CHANGE_MAX_HOST_CHECK_ATTEMPTS;linux1;5 

There are many more commands that allow the fine tuning of monitoring and check settings on the fly. It is recommended that you get acquainted with all of the external commands that your version of Nagios supports, as mentioned in the section introducing the external commands pipe.

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur


Submit a Comment

Your email address will not be published. Required fields are marked *

six + ten =