Using SNMP from Nagios

Now that we are able to query information from Windows and Unix boxes, it would be good to know how to integrate SNMP checks with Nagios. The Nagios plugins package comes with a plugin called check_snmp for checking SNMP parameters and validating their value. The plugin uses the snmpget and snmpgetnext commands from Net-SNMP, and it does not work without these commands.

The following is the syntax of the command:

check_snmp -H <ip_address> -o <OID> [-w warn_range] [-c crit_range] 
           [-C community] [-s string] [-r regex] [-R regexi] 
           [-l label] [-u units] [-d delimiter]  [-D output-delimiter] 
           [-t timeout] [-e retries] [-p port-number] 
           [-m miblist] [-P snmp version] [-L seclevel] [-U secname] 
           [-a authproto] [-A authpasswd] [-X privpasswd] 

The following table describes the commonly-used options accepted by the plugin. Options that are required are marked in bold:

Option

Description

-H, –hostname

Host name or IP address of the machine to connect to; this option must be specified

-o, –oid

OID to get from the re<ie>m</ie>ote machine; can be specified either as dot-separated numbers or as a name; multiple elements can be specified and need to be separated with commas or spaces

-w

Specifies the min:max range of values outside of which a warning state should be returned; for integer results only

-c

Specifies the min:max range of values outside of which a critical state should be returned; for integer results only

-P, --protocol

Specifies the SNMP protocol version; accepted values are 1, 2c,or 3

-C, --community

Specifies the co<ie>m</ie><ie>m</ie>unity string to be used; for SNMPv1 and SNMPv2c this defaults to public

-s, --string

Returns a critical state unless the result is an exact match of the value specified in this parameter

-r, --regex

Returns a critical state if the result does not match the specified regular expression; is case sensitive

-R, --eregi

Returns a critical state if the result does not match the specified regular expression; is case insensitive

-t, --timeout

Specifies the period in seconds after which it is assumed that no response has been received and the operation times out

-e, --retries

Specifies the number of retries that should be performed if no answer is received

-n, --next

Uses the getnext request instead of get to retrieve the next attribute after the specified one

-d, --delimiter

Specifies the deli<ie>m</ie>iter which should be used to match values in the output from the Net-SNMP commands; defaults to an equals sign: =

-D, --output-delimiter

Specifies the character used to separate output if multiple OIDs are provided

Depending on which exact flags are passed, the plugin behavior is different. In all cases, the plugin will return critical if the SNMP agent could not be contacted, or if the specified OID does not exist. If none of the flags -s, -r/-R, -w, and -c are specified, the plugin will return OK as long as the OID is not retrieved. Specifying -s will cause the check to fail if the value returned by the SNMP get request is different from the value supplied to this option. It is worth noting that this option uses an exact match, not a substring.

An example would be to make sure that the exact location is specified in an SNMP agent. This can be checked by the following command:

root@ubuntu:~# /opt/nagios/plugins/check_snmp -H 10.0.0.1 -P 2c \ 
    -o SNMPv2-MIB::sysLocation.0 -s "Miami Branch" 
SNMP OK - VMware | SNMPv2-MIB::sysLocation.0=Miami Branch 

Matching a part of text can be done with the -r or -R option. The first one is a case-sensitive match. The latter option ignores the case while matching the resulting value. Similarly, when making sure that the contact information field contains e-mail information, the following command can be used:

root@ubuntu:~# /opt/nagios/plugins/check_snmp -H 10.0.0.1 -P 2c \ 
    -o SNMPv2-MIB::sysContact.0 -r "@" 
SNMP OK - root@company.com | SNMPv2-MIB::sysContact.0=root@company.com 

It is also possible to match the specific value range for integer results, in which case the values indicate acceptable ranges for specific values. If the result is outside of a specified range, a WARNING or CRITICAL state is returned. It is possible to specify separate ranges for critical and warning checks.

Typical usage can be to monitor system load or the number of processes running on a specific host.

The following is an example of how to check if the number of system processes is less than 20:

root@ubuntu:~# /opt/nagios/plugins/check_snmp -H 10.0.0.1 -P 2c \ 
    -o HOST-RESOURCES-MIB::hrSystemProcesses.0 -w 0:20 -c 0:30 
SNMP CRITICAL - *33* | HOST-RESOURCES-MIB::hrSystemProcesses.0=33 

The check will return CRITICAL status if the number of processes is 30 or more. A WARNING status will be returned if the number of processes is 20 or more. If the number is less than 20, an OK status will be returned.

In all cases, it is advised that you first use the snmpwalk command and check which objects can be retrieved from a specific agent.

Nagios also comes with SNMP plugins written in Perl that allow the checking of network interfaces and their statuses. These plugins require the installation of the Perl Net::SNMP package. For Ubuntu Linux, the package name is libnet-snmp-perl.

The syntax of the plugins is as follows:

check_ifstatus -H hostname [-v version] [-C community] 
check_ifoperstatus -H hostname [-v version] [-C community] 
                   [-k index] [-d name] 

The following table describes the options accepted by the plugins. Required options are marked in bold:

Option

Description

-H, –hostname

The host na<ie>m</ie>e or the IP address of the machine to connect to; this option must be specified

-v, –snmp_version

Specifies the SNMP protocol version to be used; acceptable values are 1 and 2c

-C, --community

Specifies the SNMP community string to be used

-k, --key

Specifies the index of the network interface to be checked (ifIndex field)

-d, --descr

Specifies the regular expression to match the interface description (ifDescr field) against

The check_ifstatus plugin simply checks if the status of all of the interfaces is up, or if they are administratively down. If at least one interface is set down, even if all other interfaces are set up properly, a critical status is reported.

The check_ifoperstatus plugin allows you to check the status of a specific network interface. It is possible to specify either the index of the interface or an expression to match the device name against. An example to check the eth1 interface is as follows:

root@ubuntu:~# /opt/nagios/plugins/check_ifoperstatus -H 10.0.0.1 \      -d eth1 
OK: Interface eth1 (index 3) is up. 

As we also checked the index that eth1 is associated with, we can now use the -k option to check the interface status:

root@ubuntu:~# /opt/nagios/plugins/check_ifoperstatus -H 10.0.0.1 -k 3 
OK: Interface eth1 (index 3) is up. 

The main difference is that by using the -d flag, you make sure that changes to the indexes of the network interfaces shifting your configuration are not affected. On the other hand, using the -k flag is faster. If you are sure that your interfaces will not change, it’s better to use -k; otherwise -d should be used.

The next step is to configure the Nagios commands and services for the SNMP usage. We will define a command and a corresponding service. We will also show how custom variables can be used to standardize command definitions.

The following is a generic command used to query SNMP:

  define command 
  { 
    command_name check_snmp 
    command_line $USER1$/check_snmp -P 1 -H $HOSTADDRESS$ 
                 -o $ARG1$ $ARG2$ 
  } 

Using the Nagios 3 functionality, we can also define the _SNMPVERSION and _SNMPCOMMUNITY parameters in the host object for all of the devices that are SNMP-aware, and use them in the following command:

  define host 
  { 
    use                             generic-host 
    host_name                       linuxbox01 
    address                         10.0.2.1 
    _SNMPVERSION                    2c 
    _SNMPCOMMUNITY                  public 
  } 
  define command   { 
    command_name check_snmp 
    command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ 
                 -P $_HOSTSNMPVERSION$ -C $_HOSTSNMPCOMMUNITY$ $ARG2$ 
  } 

Next, we should define one or more services that will communicate over SNMP.

Let’s check for a number of processes and add some constraints that we want to be monitored:

  define service 
  { 
    use                  generic-service 
    hostgroup_name       snmp-aware 
    service_description  Processes 
    check_command        check_snmp!HOST-RESOURCES-                          MIB::hrSystemProcesses.0!-w 0:250 -c 0:500 
  } 

Please note that the check_command statement above needs to be specified on a single line. The above check will monitor the number of processes running on a system.

It’s worth mentioning that for Microsoft Windows systems the number of processes that should trigger a warning and critical state should be much lower than shown in the above example.

Receiving traps

SNMP traps work in opposite ways to get and set requests. That is, the agent sends a message, as a UDP packet, to the SNMP manager when a problem occurs. For example, a link down or system crash message can be sent out to the manager so that administrators are alerted instantly. Traps differ across versions of the SNMP protocols. For SNMPv1, they are called traps, and are messages that do not require any confirmation by the manager. For SNMPv2, they are called informs and require the manager to acknowledge that it has received the inform message.

In order to receive traps or informs, the SNMP software needs to accept incoming connections on UDP port 162, which is the standard port for sending and receiving SNMP trap/inform packets. In some SNMP management software, trap notifications are handled within separate applications, while in others, they are integrated into an entire SNMP manager backend.

For a Net-SNMP trap, the daemon is a part of the SNMP daemons, but is a separate binary, called snmptrapd, which, by default, is not started. To change this, we will need to modify the /etc/default/snmpd file and change the TRAPDRUN variable to yes, as shown here:

TRAPDRUN=yes 

Changing this option requires restarting the SNMP agent by invoking the service snmpd restart command.

On Ubuntu Linux, the trap listening daemon keeps its configuration file in /etc/snmp/snmptrapd.conf. For other systems, it may be in a different location.

The daemon can log specified SNMP traps/informs. It can be configured to run predefined applications or to forward all or specific packets to other managers.

A sample configuration that logs all incoming traps but only if they originate from the SNMPv1 and SNMPv2c private community would look like this:

authCommunity log,execute,net private 

This option enables the logging of traps from the private community originating from any address. It also allows the execution of handler scripts and forwarding traps to other hosts. But this requires additional configuration directives.

Each change in the snmptrapd.conf file requires a restart of the snmpd service.

Usually, traps will be received from a device such as a network router or another computer from which we want to receive traps. We will need two machines with Net-SNMP installed—one for sending the trap and another that will process it. We can use any machine for sending the traps. However, the one processing it should be the one where Nagios is installed, so we can pass it on later. For the purpose of this section, we will use another computer and define a test MIB definition.

We need to create an MIB file called NAGIOS-TRAP-TEST-MIB.txt that will define the types of traps and their OIDs. On Ubuntu, the file should be put in /usr/share/snmp/mibs; for other platforms, it should be in the same location as the SNMPv2-SMI.txt file.

The contents of the file should be as follows:

NAGIOS-TRAP-TEST-MIB DEFINITIONS ::= BEGIN 
       IMPORTS enterprises FROM SNMPv2-SMI; 
 
 nagiostests OBJECT IDENTIFIER ::= { enterprises 0 } 
 nagiostraps OBJECT IDENTIFIER ::= { nagiostests 1 } 
 nagiosnotifs OBJECT IDENTIFIER ::= { nagiostests 2 } 
 
 nagiosTrap TRAP-TYPE 
       ENTERPRISE nagiostraps 
       VARIABLES { sysLocation } 
       DESCRIPTION "SNMPv1 notification" 
       ::= 1 
 
 nagiosNotif NOTIFICATION-TYPE 
       OBJECTS { sysLocation } 
       STATUS current 
       DESCRIPTION "SNMPv2c notification" 
       ::= { nagiosnotifs 2 } 
 END 

This contains definitions for both the SNMPv1 trap called nagiosTrap and the inform packet for SNMPv2c called nagiosNotif. The file should be copied to all of machines that will either send or receive these trap/inform packets. In this example, we are using a sub-tree of the enterprises branch in SNMPv2-MIB, but this should not be used in any production environment as this is a reserved part of the MIB tree.

In order to send such a trap as an SNMPv1 packet, we need to invoke the following command on the machine that will send the traps, replacing the IP address with the actual address of the machine that is running the snmptrapd process.

root@ubuntu2:~# snmptrap -v 1 -c private 192.168.2.51 \ 
    NAGIOS-TRAP-TEST-MIB::nagiostraps "" 6 nagiosTrap "" \ 
    SNMPv2-MIB::sysLocation.0 s "Server Room" 

Sending an SNMPv2c notification will look like this:

root@ubuntu2:~# snmptrap -v 2c -c private 192.168.2.51 "" \ 
    NAGIOS-TRAP-TEST-MIB::nagiosNotif \ 
    SNMPv2-MIB::sysLocation.0 s "Server Room" 

Please note that, in both cases, there is no confirmation that the packet was received. In order to determine this, we need to check the system logs—usually the /var/log/syslog or /var/log/messages files. The following command should return log entries related to traps:

root@ubuntu:~# grep TRAP /var/log/syslog /var/log/messages 

Now that we know how to send traps, we should take care so that we handle them properly. The first thing that needs to be done is to add scripts as event handlers for the traps that we previously defined. We need to add these handlers on the machine that has the Nagios daemon running.

To do this, add the following lines to snmptrapd.conf, and restart the snmpd service:

traphandle NAGIOS-TRAP-TEST-MIB::nagiostraps /opt/nagios/bin/passMessage 
traphandle NAGIOS-TRAP-TEST-MIB::nagiosnotifs /opt/nagios/bin/passMessage 

We now need to create the actual /opt/nagios/bin/passMessage script that will forward information about the traps to Nagios:

 #!/bin/sh 
 
 CMD=/var/nagios/rw/nagios.cmd 
 
 read ORIGHOSTNAME 
 read ORIGIP 
 # parse IP address 
 IPADDR=`echo "$ORIGIP" | sed 's,^...: \[,,;s,\]:.*$,,'` 
 HOST="" 
 
 # map IP address of the trap to host and service for which 
 # the check result should be sent as 
 case $IPADDR in 
   192.168.2.52) 
     HOST=ubuntu2 
     SVC=TrapTest 
     ;; 
   esac 
 
 if [ "x$HOST" = "x" ] ; then 
   exit 1 
 fi 
 
 # send check result to Nagios 
 CLK=`date +%s` 
 echo "[$CLK] PROCESS_SERVICE_CHECK_RESULT;$HOST;$SVC;2;Trap received" 
 
 exit 0 

When used for a volatile service, this offers a convenient way to track SNMP traps and notifications in Nagios. A volatile service is similar to normal Nagios services, except that every time a service is in a hard non-OK state (such as WARNING, CRITICAL,or UNKNOWN) and the check (either active or passive) returns a non-OK state, contacts are immediately notified and its state is logged.

A service is configured to be volatile by enabling the is_volatile directive. It is also common to set max_check_attempts for the volatile service to 1—so that each non-OK check result will cause it to be in a hard state. For example:

  define service 
  { 
    hostgroup_name         snmp-trap-receivers 
    service_description    TrapTest 
    is_volatile            1 
    max_check_attempts     1 
    active_checks_enabled  0 
    passive_checks_enabled 1 
  } 

The directive also disables performing active checks and ensures passive checks are enabled for the service.

Using Nagios to track SNMP traps also allows you to merge it with powerful event handling mechanisms inside Nagios. This can cause Nagios to perform other checks, or try to recover from the error, when a trap is received.

Using additional plugins

NagiosExchange hosts a large number of third-party plugins under the Check Plugins, Software, SNMP category. These allow the monitoring of the system load over SNMP, the monitoring of processes, and storage space, and the performance of many other types of checks. You can also find checks that are dedicated to specific hardware, such as Cisco or Nortel routers. There are also plugins for monitoring bandwidth usage.

There are also dedicated SNMP-based check plugins that allow the monitoring of many aspects of Microsoft Windows, without installing dedicated Nagios agents on these machines. This includes checks for IIS web server, checking whether WINS and DHCP processes are running, and so on.

The Manubulon site (http://nagios.manubulon.com/) also offers a very wide variety of SNMP plugins. These offer checks for specific processes that are running, monitoring the system load, CPU usage and network interfaces, and options specific to routers.

Another interesting SNMP use is to monitor the network bandwidth usage. In this case, Nagios can be integrated with the Multi Router Traffic Grapher (MRTG) package (see http://www.mrtg.org/). This is a utility that allows the creation of graphs of bandwidth usage on various network interfaces that also use SNMP to gather information on traffic. Nagios offers a check_mrtg plugin (see http://nagios-plugins.org/doc/man/check_mrtg.html) that can be used to retrieve bandwidth usage information from the MRTG log files.

Most companies that need bandwidth monitoring already use MRTG, as it is the most popular solution for this task. That is why it is a good idea to integrate Nagios if you already have MRTG set up. Otherwise, it is better to use a dedicated bandwidth monitoring system.

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

2 × 5 =