Introducing the query handler in Nagios

The query handler allows two-way communication with Nagios internal processes and external applications. It is designed to be extensible, and future versions of Nagios may provide more functionality using the query handlers.

The query handler communicates using Unix domain sockets (refer to http://en.wikipedia.org/wiki/Unix_domain_socket for more details). These are meant for communication between processes on the same machine. Unix domain sockets use filesystem as names for remote addresses. The location (address) of the Nagios query handler is similar to the Nagios external command pipe—it is called nagios.qh and by default resides in the same directory as the external commands pipe. For example, /var/nagios/rw/nagios.qh is the path to query handler’s Unix domain socket for an installation performed according to the steps given in Chapter 2, Installing Nagios 4. Filesystem permissions are used to determine if a process can connect to the other side or not. So, it is possible to limit access to the query handler only to specific operating system users or groups.

Unix domain sockets are very similar to named pipes (such as the Nagios external commands pipe), however, it is not possible to use named pipes for two-way communication with more than one client. Another difference is that it is not possible to open it as a file and/or send commands to the socket using shell commands such as echo—which is possible for named pipes such as the Nagios external command pipe.

Nagios provides its functionality through the query handler using services. There are several built-in services and the ones that are public are described throughout this chapter. Future versions of Nagios (or third-party software) may provide additional services. Each command sent to Nagios is prefixed with the service name, so each service may use any names for its sub-commands.

Nagios uses the query handler internally to distribute jobs to worker processes. Child processes connect to the query handler and receive tasks that should be performed. This is one of the reasons the query handler was originally created—to be able to control the worker processes. The worker processes use the wproc service, which is an internal service and should only be used by Nagios processes.

Nagios also provides services that can be used by external applications. The first and most basic one is echo, which simply responds with the data that was sent to it. It is mainly a useful tool for learning to communicate with Nagios.

The core service allows information about Nagios processes and scheduled jobs queue. The nerd service allows subscribing to events and can be used to receive real-time updates about Nagios host and/or service status changes.

Communicating with the query handler

Let’s start understanding the query handler by communicating with it from the shell. There are multiple commands that allow connecting to Unix domain sockets such as netcat (refer to http://netcat.sourceforge.net/ for more detail) and socat (refer to http://www.dest-unreach.org/socat/ for more details). Both can be used to send commands to the Nagios query handler and to install the tools; simply run the following command on Ubuntu:

root@ubuntu:# apt-get install socat netcat 

For Red Hat Enterprise Linux, CentOS, and Fedora Core you can run the following command:

# yum install socat nc 

For Red Hat Enterprise Linux / CentOS 7 and later, both packages are available by default. For earlier versions, the socat package is available as part of EPEL (refer to https://fedoraproject.org/wiki/EPEL for more detail) and is not available unless EPEL is installed.

This will install both of the tools, which will be used later to check communication with the query handler.

The communication protocol for the query handler is simple. There is no initial message, so after connecting we can simply send commands to the query handler.

All commands that are sent to the query handler are prefixed with the name of the handler and are sent in the following way:

@service command\0 

Where, @service is the name of the service prefixed with the @ character, command is the command (and parameters) to send and \0 is a character with the ASCII code of 0 that indicates end of command. Nagios may also send information—either responses to commands or notifications. The format of the response varies on the service that implements it.

Many commands return an answer or start sending notifications after the command is invoked. However, some commands, such as modifying settings, will return an exit code. The code is modeled after HTTP status codes (refer to http://en.wikipedia.org/wiki/List_of_HTTP_status_codes) where codes from 200 indicate success and codes from 400 indicate an error.

Nagios provides the @echo service that can be used to test connectivity to the query handler. It will return the same message that was sent to it. To test connectivity, we can simply run the following command:

root@ubuntu:# echo -e '@echo Query handler is working!\0' | \ 
    socat - UNIX-CONNECT:/var/nagios/rw/nagios.qh 

The first line generates a command to send to the @core service. The -e option passed to the echo command enables interpretation of backslash escapes, which changes \0 to the ASCII character 0.

Next, the output from the echo command is sent to the socat command, which sends its output to the query handler and prints out the result to standard output. The socat command takes two arguments-the channels to relay data for. The - indicates using standard input/output and UNIX-CONNECT:/var/nagios/rw/nagios.qh indicates Unix domain socket to the Nagios query handler.

If the command succeeds, its output should be Query handler is working properly! as the output.

If the current user does not have access to connect to the socket, the output will indicate an error as follows:

socat E connect(3, AF=1 "https://510848-1853064-raikfcquaxqncofqfm.stackpathdns.com/var/nagios/rw/nagios.qh", 26): Permission denied 

For netcat, the command is similar:

root@ubuntu:#  echo -e '@echo Query handler is working!\0' | \
nc -U /var/nagios/rw/nagios.qh

The first line of the command is identical to the previous example. The -U option for the netcat command causes it to connect to the Unix domain socket with the address specified from the command line.

It is also perfectly possible to communicate with the query handler from the code, as will be shown in the next section.

A single connection to Nagios can be used to send multiple commands and/or receive multiple types of information, however, as the formats of the responses may vary, it is best to use a single connection only for single service, that is, use one connection for managing the Nagios load and another connection for getting notifications about host and/or service check results.

Using the query handler programmatically

Now that we know how to communicate with the Nagios query handler, we can do so programmatically. Almost all languages provide a mechanism to communicate using Unix domain sockets.

For example, to send a test message using JavaScript, we can use the node.js built-in net module to communicate with the query handler:

var net = require('net'); 
 
var msg = 'Query handler is working properly!' 
 
var client = net.connect({ 
  path: 'https://510848-1853064-raikfcquaxqncofqfm.stackpathdns.com/var/nagios/rw/nagios.qh' 
}, function () { 
  client.write('@echo ' + msg + '\0'); 
}); 
 
client.on('data', function (data) { 
  if(data.toString() === msg) { 
    console.log('Return message matches sent message'); 
    client.end(); 
    process.exit(0); 
  } else { 
    console.log('Return message does not match'); 
    client.end(); 
    process.exit(1); 
  } 
}) 
 
client.on('error', function (err) { 
    console.log(err); 
})  

The preceding code sends a test message to the @echo query handler service and retrieves the result.

For other programming languages, the support for Unix domain sockets may be built-in or require additional modules or packages, but as the technology is quite common, commonly used languages should provide support for it.

Using the core service

The Nagios query handler provides the @core service, which can be used to get information about Nagios processes and set some of the information.

For all commands handled by the @core service, the result is a text ending with the \0 character. To read a response, all that is needed is to read until we receive \0, which indicates an end of response.

It allows querying information about the queue of scheduled jobs such as the next active checks or the background operations to be performed. The command name is squeuestats. The following is the full command to send:

@core squeuestats\0 

The result is a string with multiple statistics information in the form of name=value, separated by semicolon.

name1=value1;name2=value2;....

For example, to print all information we can simply prepare core.squeuestats.js with the following contents:

var net = require('net'); 
 
var client = net.connect({ 
  path: 'https://510848-1853064-raikfcquaxqncofqfm.stackpathdns.com/var/nagios/rw/nagios.qh' 
}, function () { 
  client.write('@core squeuestats\0'); 
}); 
 
client.on('data', function (data) { 
  data.toString().split(';').forEach(function (line) { 
    console.log(line); 
  }); 
  client.end(); 
}) 
 
client.on('error', function (err) { 
    console.log(err); 
})  

The code connects to the Nagios socket, sends the @core squeuestats command, and then reads the response. Then the result is split by a semicolon, next, it is sorted, and finally it is printed as text:

root@ubuntu:# nodejs core.squeuestats.js 
CHECK_PROGRAM_UPDATE=1
CHECK_REAPER=1
COMMAND_CHECK=0
EXPIRE_COMMENT=0
EXPIRE_DOWNTIME=0
HFRESHNESS_CHECK=0
HOST_CHECK=4
LOG_ROTATION=1
ORPHAN_CHECK=1
PROGRAM_RESTART=0
PROGRAM_SHUTDOWN=0
RESCHEDULE_CHECKS=0
RETENTION_SAVE=1
SCHEDULED_DOWNTIME=0
SERVICE_CHECK=18
SFRESHNESS_CHECK=1
SLEEP=0
SQUEUE_ENTRIES=29
STATUS_SAVE=1
USER_FUNCTION=0

Another command the @core service provides is loadctl, which can be used to get values for all available load control settings or change one of their values. The syntax for the command is as follows:

@core loadctl 
@core loadctl setting=value 
@core loadctl setting1=value1;setting2=value2;...

Let’s send the first one.

var net = require('net'); 
 
var client = net.connect({ 
  path: 'https://510848-1853064-raikfcquaxqncofqfm.stackpathdns.com/var/nagios/rw/nagios.qh' 
}, function () { 
  client.write('@core loadctl\0'); 
}); 
 
client.on('data', function (data) { 
  data.toString().split(';').forEach(function (line) { 
    console.log(line); 
  }); 
  client.end(); 
}) 

It returns a list of all load control settings in the form of the setting=value option, separated by semicolon, as shown here:

jobs_max=3896 
jobs_min=20 
jobs_running=0 
jobs_limit=9999 
load=0.00 
backoff_limit=2.50 
backoff_change=4855 
rampup_limit=0.80 
rampup_change=1213 
nproc_limit=47150 
nofile_limit=4096 
options=0 
changes=0  

If the loadctl command has any settings specified, they are changed and the command returns whether it succeeded or failed.

For example, we can change the jobs_max setting by executing the following:

client.write('@core loadctl jobs_max=9999\0');  

The Nagios query handler will return the 200: OK message in case of success. A response with the  400 code indicates that the setting was not found or not modified.

Note

The load control settings are Nagios internal settings, it is not recommended that you modify them unless needed. The preceding example simply illustrates how this can be done if needed.

Introducing the Nagios event radio dispatcher

The query handler also includes a NERD service, which allows subscribing to service or host check results. The service name is @nerd and it accepts the following commands:

@nerd list\0 
@nerd subscribe <channel>\0 
@nerd unsubscribe <channel>\0 

The list command returns a list of channels separated by newlines, where channel name is the first word of a line followed by channel description. The subscribe and unsubscribe commands can be used to start and stop getting notifications for a specified channel.

For example, to list all available channels we can simply do the following from the shell:

# echo -e '@nerd list\0' | \
socat - UNIX-CONNECT:/var/nagios/rw/nagios.qh

The output should be as follows:

hostchecks      Host check results 
servicechecks   Service check results 
opathchecks     Host and service checks in gource's log format 

The opathchecks channel for notifications can be used together with the Gource visualization tool to show the animated host and service check updates. This functionality is described later in this chapter.

The hostchecks and servicechecks channels can be used to receive updates regarding host and/or service status changes. The format for the respective channels is as follows:

<hostname> from <old_code> -> <new_code>: <description> 
<hostname>;<servicename> from <old_code> -> <new_code>: <description> 

Where <old_code> and <new_code> correspond to the exit codes for check results.

For host checks, the codes map is as follows:

Exit code

Description

0

UP

1

DOWN

2

UNREACHABLE

For service checks, the values are as follows:

Exit code

Description

0

OK

1

WARNING

2

CRITICAL

3

UNKNOWN

Once a socket is subscribed to a channel, updates regarding hosts and/or services are sent separated by a newline character. Reading status updates for hosts or services can be done by simply subscribing to one or more channels and reading from the socket line by line.

For example, the following code subscribes for both host and service updates and prints out the results accordingly:

var net = require('net'); 
 
var client = net.connect({ 
  path: 'https://510848-1853064-raikfcquaxqncofqfm.stackpathdns.com/var/nagios/rw/nagios.qh' 
}, function () { 
  client.write('@nerd subscribe hostchecks\0'); 
  client.write('@nerd subscribe servicechecks\0'); 
}); 
 
var statuses = { 
  host: ['UP', 'DOWN', 'UNREACHABLE'], 
  service: ['OK', 'WARNING', 'CRITICAL',  'UNKNOWN'] 
}; 
 
var serviceRegExp = /(.*?);(.*?) from ([0-9]+) -> ([0-9]+): (.*)$/; 
var hostRegExp = /(.*?) from ([0-9]+) -> ([0-9]+): (.*)$/; 
client.on('data', function (data) { 
  var msg = data.toString().trim(); 
  if (serviceRegExp.test(msg)) { 
    var tokens = serviceRegExp.exec(msg); 
    var status = Math.max(0, Math.min(tokens[4], 3)); 
    console.log('Service', tokens[2], 'on', tokens[1], 'is', statuses.service[status], ':', tokens[5]); 
  } else if (hostRegExp.test(msg)) { 
    var tokens = hostRegExp.exec(msg); 
    var status = Math.max(0, Math.min(tokens[3], 2)); 
    console.log('Host', tokens[1], 'is', statuses.host[status], ':', tokens[4]); 
  } 
}) 
 
client.on('error', function (err) { 
    console.log(err); 
})  

The code uses regular expressions to parse the lines and first tries to parse the result as service status updates and then checks if it matches the host status expression.

Note that the code is mainly meant for demonstration and is far from being a complete example. A final application that uses NERD to receive notifications should handle the case when the socket is closed and retry connecting back to Nagios to handle cases such as Nagios is restarted.

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

3 × 5 =