Using templates for distributed monitoring in Nagios

Multiple inheritances can be used to manage the configurations for distributed Nagios setups. This can help avoiding reconfiguring all of the objects and managing two sets of configurations. Multiple inheritance can be used to separate parts that are common to both master and slave Nagios instances from information that is local to each Nagios instance. We’ll assume each location will have a single Nagios instance that is a slave instance to the central Nagios instance.

For each location, there will be local and remote templates. Slave instances will load the local template for its own location and not load the configuration for other locations. Master instance(s) will load the remote template for each location that will report information to this machine.

The actual hosts and services will inherit a template for a specific check such as the CPU load or the service template monitoring the HTTP server. They will also inherit a location’s template— local or remote as first items in the inheritance list. This will allow the location templates to override all the configuration options set by other templates.

The local and remote templates will define whether regular checks will be done or if the passive check results should be used. Each Nagios instance will load the local or remote definition of the location template.

For the examples mentioned in previous sections, the following would be loaded in branch 1:

cfg_dir=global_configuration 
cfg_dir=branch1 
cfg_dir=branch1_local 

This will cause Nagios to load the definition for the global configuration, which may include users, time periods, generic hosts, and service templates. It will also load the local templates and the definition of objects for branch1. All other branches’ configurations will load their respective branch objects.

For master Nagios instances, the loaded configurations will be as follows:

cfg_dir=global_configuration 
cfg_dir=branch1 
cfg_dir=branch1_remote 
cfg_dir=branch2 
cfg_dir=branch2_remote 
cfg_dir=branch3 
cfg_dir=branch3_remote 
cfg_dir=branch4 
cfg_dir=branch4_remote 

This will load the global configuration objects, definitions of objects for all branches, and each branch’s remote templates.

Creating the host and service objects

For the examples mentioned in previous sections, a typical host definition will be in the branch1 directory and will look as follows:

define host{ 
    use                         branch1-server 
    host_name                   branch1:webserver 
    hostgroups                  branch1-servers 
    address                     192.168.0.1 
    } 

The branch1-server will be defined in both the branch1_local and branch1_remote directories. The definition in the branch1_local directory will be as follows:

define host{ 
    register                    0 
    use                         generic-server 
    name                        branch1-server 
    contact_groups              branch1-admins 
    obsess_over_host            1 
    } 

The definition for the remote location will be as follows:

define host{ 
    register                    0 
    use                         remote-server 
    name                        branch1-server 
    contact_groups              branch1-admins 
    } 

The generic-server can be a typical host template. The remote-server uses this definition, but disables active checks and enables the accepting of passive check results. An example definition of generic-server is as follows:

define host{ 
    register                    0 
    use                         generic-server 
    name                        remote-server 
    active_checks_enabled       0 
    passive_checks_enabled      1 
    notifications_enabled       0 
    } 

With this definition, the host for a local branch will perform active checks if it is alive. The obsess_over_host will cause results to be sent to the master Nagios instance. For remote locations, it will only accept remote check results and will not send any notifications, so each host that is down is only reported from the local Nagios instance.

A typical service is defined as follows:

define service{ 
    use                         branch1-service,service-http 
    host_name                   branch1:webserver 
    service_description         HTTP 
    } 

The service-http service will define a check using check_http and additional options for the check itself.

The local definition for branch1-service will be similar to the following code:

define service{ 
    register                    0 
    name                        branch1-service 
    contact_groups              branch1-admins 
    obsess_over_service         1 
    } 

For the remote services, it should be as follows:

define service{ 
    register                    0 
    name                        branch1-service 
    use                         remote-service 
    contact_groups              branch1-admins 
    } 

The local definition does not perform many changes in the service. It specifies the default contact group to use for all services and enables obsession over the service—so status updates are sent to the master Nagios instance.

The remote directory uses the remote-service definition, which will disable active checks unless no passive check result is received. For example, a remote-service definition can be as follows:

define service{ 
    register                    0 
    name                        remote-service 
    active_checks_enabled       0 
    check_freshness             1 
    freshness_threshold         43200 
    check_command               check_dummy!3!No recent passive check result 
    notification_options        u 
    event_handler_enabled       0 
    } 

This makes Nagios run an active check in case no passive result is received for 12 hours. The active check will simply report an unknown status stating that no recent passive check was received.

Notifications for remote services is only enabled for an unknown status. This sends out notifications whenever no active check results are received by the master Nagios instance, but prevents sending of notifications to statuses sent by the slave server as passive check results.

The check_dummy command simply invokes the check_dummy plugin, which reports an UNKNOWN status and a message that no recent result was received. The check_dummy command definition is as follows:

define command{ 
    command_name       check_dummy 
    command_line       $USER1$/check_dummy $ARG1$ "$ARG2$"      
    } 

This way the host and service definitions can be shared for all Nagios instances and the templates for each location determine whether the active checks should be run.

The remote-server and remote-service templates are shared across all Nagios instances, which can be helpful in managing configurations that consist of many branches.

Customizing checks with custom variables

This approach has a downside—each service check has to be defined as a template. However, Nagios custom variables can be used to allow the fine-tuning of the service check for each object. For example, for the HTTP check, it could be as follows:

define command{ 
    command_name  check_http_port 
    command_line  $USER1$/check_http -H $ARG1$ -p $ARG2$ 
    } 
 
define service{ 
    use                         generic-service 
    name                        service-http 
    register                    0 
    check_command                  check_http_port!$_SERVICEHOSTNAME$!$_SERVICEHTTPPORT$ 
    _HTTPPORT                   80 
    } 

This allows us to override the port to use the HTTP checks by specifying _HTTPPORT in the actual service as follows:

define service{ 
    use                         branch1-service,service-http 
    host_name                   branch1:webserver 
    service_description         HTTP on port 8080 
    _HTTPPORT                   8080 
    } 

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

1 × four =