Monitoring Windows machines using Nagios

NSClient++ (NSCP) is an open source project that is based on and extends the NSClient concept. The original concept was to create an agent for Windows that, once installed, allows the querying of system information. NSClient has created a de facto standard protocol that offers the ability to query variables with parameters. NSClient++ uses the same protocol, but also offers the ability to perform more complex checks using the NRPE protocol. NSClient++ has to be installed on all Windows machines that will be monitored.

Installing NSClient++

NSClient++ can be downloaded from http://www.nsclient.org/, and it provides client for both 32 bit and 64 bit Windows versions. At the time of writing, the latest stable version of NSClient++ is 0.4.3. It is recommended that you download the version matching your Windows. After downloading it, simply run the installer on one or more machines to be able to monitor them.

The first question shown by the installer is regarding the installation type:

 

 

If unsure, it is recommended that you perform a complete installation, but advanced users may wish to choose which features to install. The installer will also ask for the location to install itself. Unless you need to install NSClient++ in a specific location, it is best to use the default path of C:\Program Files\NSClient++.

After installation, the installer will show basic options for its configuration, as follows:

 

 

The Allowed hosts should be set to the IP address and/or IP address ranges of machines that will be performing the checks, such as 192.168.0.0/24, if that is the IP address range of your local network.

It is recommended that you enable the following options:

  • Enable common check plugins: This option will enable common plugins for performing checks and set up the default configuration

  • Enable nsclient server: This option will allow using the  check_nt Nagios plugin to perform basic tests

  • Enable NRPE server: This option will allow sending checks to the NSClient++ using the check_nrpe plugin

It is also recommended to set up the NRPE server to run in Insecure legacy mode as we are going to be using the check_nrpe plugin, which does not currently support the newer security features of NSClient++.

The shown password will be used to communicate with NSClient++. It will also put in the configuration file, so it is not needed to write it down at this point.

After a successful installation, NSClient++ registers itself as a Windows service and starts automatically.

Configuring NSClient++

NSClient++ is a very powerful agent and has a modular design. It consists of many modules that can be enabled or disabled.

There are multiple modules providing network protocol servers, such as NSClientServer that exposes protocol supported by the check_nt Nagios plugin or NRPEServer that allows making checks over the NRPE protocol. At installation time, the second and third checkboxes were to enable the said modules.

The check_nt (handled by the NSClientServer module) and check_nrpe (handled by the NRPEServer module) plugins both listen on connections from the Nagios server and run checks that are sent via the protocol. They each use different protocols—check_nt connects to TCP port 12489 and uses a dedicated protocol, while check_nrpe uses a more generic NRPE protocol that is described in more details in Chapter 10, Monitoring Remote Hosts, and is listening on TCP port 5666.

There are also many modules that provide commands—these usually do certain checks on the system. The commands can be called over NRPE or other protocols. The modules include many checks specific to Windows as well as helper checks, such as check_negate that helps validating if a condition is not met. Many of the modules were enabled by enabling the first checkbox at the end of the installation.

NSClient++ has exhaustive documentation on all of the modules available from its reference manual available at http://docs.nsclient.org/reference/index.html#windows—modules. The list includes both network protocol modules as well as those providing various types of checks.

All this is configured by editing the configuration file called nsclient.ini that is present in the installation directory. The file uses a standard ini file syntax—sections are put inside square brackets, values are put in the name = value form and comments begin with a semicolon, as follows:

[/modules] 
; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests. 
NSClientServer = enabled 
 
; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests. 
NRPEServer = enabled 
 
; CheckWMI - Check status via WMI 
CheckWMI = enabled 
; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters. 
CheckSystem = enabled 
 
; CheckExternalScripts - Execute external scripts 
CheckExternalScripts = enabled 
 
; CheckEventLog - Check for errors and warnings in the event log. 
CheckEventLog = enabled 
 
; CheckLogFile - File for checking log files and various other forms of updating text files 
CheckLogFile = enabled 
 
; CheckDisk - CheckDisk can check various file and disk related things. 
CheckDisk = enabled 

The /modules section defines modules to load into NSClient++.

Global configuration is put in the /settings/default section:

[/settings/default] 
password = QsCrvGnz13# 
allowed hosts = 192.168.0.0/24 

Modules may either use default configuration options or define their options in the respective section such as the NRPEServer module configuration that is placed inside the /settings/NRPEServer section.

Each module has all of its options mentioned in the reference manual for NSClient++ modules.

Changes in the NSClient++ configuration are mainly related to advanced features, such as enabling a specific module or configuring external scripts, which are described in more detail later in this chapter.

If you need to make any changes, they will be applied only when NSClient++ is restarted. To do this, go to the Services administrative panel that can be found in the start menu. Then locate the NSClient++ service and choose the Restart option, as follows:

 

Monitoring Windows using check_nt

NSClient++ offers a uniform mechanism to query the system information. Basic system information can be retrieved using the check_nt command from a standard Nagios plugins package.

The syntax and options of the command are as follows:

check_nt -H <host> [-p <port>] [-s <password>] [-w level] 
         [-c level] -v <variable> -l <arguments> -s <password> 

Option

Description

-H , --hostname

This option must be specified to denote the hostname or IP address of the machine to connect to.

-p , --port

This specifies the TCP port number to connect to. For NSClient++, it should be set to 1248, which is the default port.

-s , --secret

This specifies the password to use for authentication. This is optional and is needed only if a password is set up on the Windows agent.

-v , --variable

This is the variable to query. The possible variables are described further in this section.

-l , --arguments

This is for the arguments to be passed to the variable and is optional.

-w , --warning

This specifies the return values above which a warning state should be returned.

-c , --critical

This specifies the return values above which a critical state should be returned.

When using the check_nt plugin with NSClient++, we need to specify the port 12489, as NSClient++ uses a different port by default. We also need to specify the password set at installation time (or manually configured in the configuration file). So the flags used for all checks will be -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' assuming the IP address for the Windows machine is 192.168.0.210 and the password set at installation time is QsCrvGnz13#.

The variables specified with the -v option are predefined. Most checks return both the string representation and an integer value. If an integer value is present, then the -w and -c flags can be used to specify the values that will indicate a problem.

The first variable is CPULOAD which allows the querying of processor usage over a specified period of time. The parameters are one or more series of the <time>, <warning>, and <critical> levels, where time is denoted in minutes and the warning/critical values specify, in percentage, the CPU usage that can trigger a problem, as seen in the following example:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \
  -v CPULOAD -l 1,80,90
CPU Load 2% (1 min average) |   '1 min avg Load'=2%;80;90;0;100

In order to set up a check in Nagios, it is recommended that you use a custom variable in the host to specify the check_nt password, as follows:

define host{ 
    host_name                       windows210 
    hostgroups                      windowsservers 
    alias                           Windows 2 10 
    address                         192.168.0.210 
    check_command                   check-host-alive 
    (...) 
    _CHECKNTPASSWORD                QsCrvGnz13# 
    } 

Then the check command would be as follows:

define command{ 
    command_name        check_nt 
    command_line        $USER1$/check_nt -H $HOSTADDRESS$ 
                        -p 12489 -s "$_HOSTCHECKNTPASSWORD" 
                        -v $ARG1$ $ARG2$ 
    } 

And finally, a service would be defined as:

define service{ 
    host_name           windows210 
    service_description CPU Load 
    check_command       check_nt!CPULOAD!-l 1,80,90 
    } 

The USEDDISKSPACE variable can be used to monitor space usage. The argument should be a partition letter. The -w and -c options are used to specify the percentage of used disk space that can trigger a problem, as shown in the following example:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \
-v USEDDISKSPACE -l C -w 80 -c 90
C:\ - total: 24.41 Gb - used: 17.96 Gb (74%) - free 6.45 Gb (26%)    | 'C:\ Used Space'=17.96Gb;0.00;0.00;0.00;24.41

Same as the preceding one, a Nagios service definition for this would be very similar—only changing the arguments to use a different variable and conditions:

define service{ 
    host_name           windows210 
    service_description Disk usage - C: 
    check_command       check_nt!USEDDISKSPACE!-l C -w 80 -c 90 
    } 

System services can also be monitored using the SERVICESTATE variable. The arguments must specify one or more internal service names, separated by commas. Internal service names can be checked in the Services management console, as shown in the following example:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \
-v SERVICESTATE -l nscp,Schedule
OK: All 2 service(s) are ok.

This checks that the NSClient++ service (whose internal name is nscp) and the Windows scheduler (whose internal name is Schedule) are working.

Similar to monitoring services, it is also possible to monitor processes running on a Windows machine. The PROCSTATE variable can be used to achieve this. The variable accepts a list of executable names separated by commas, as shown in the following example:

# check_nt -h 192.168.2.11
 -v PROCSTATE -l winword.exe
OK: All processes are ok.

The MEMUSE variable can be used to check memory usage. This does not require any additional arguments. The -w and -c arguments are used to specify the warning and critical limits, as seen in the following example:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \
-v MEMUSE -w 80 -c 90
Memory usage: total:5503.54 MB - used: 1317.99 MB (24%) - free: 4185.54 MB (76%) |'Memory usage'=1317.99MB;4402.83;4953.18;0.00;5503.54

Another thing that can be checked is the age of a file using the FILEAGE variable. This variable allows the verification of whether a specified file has been modified within a specified time period. The -w and -c arguments are used to specify the warning and critical limits, respectively. Their values indicate the number of minutes within which a file should have been modified—a value of 240 means that a warning or critical state should be returned if a file has not been modified within the last four hours, as shown in the following example:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \
-v FILEAGE -l "C:/Program Files/NSClient++/nsclient.log" -w 120 -c 240
nsclient.log 2016-may-08 07:51:25

It is also possible to check the version of the agent. This makes the maintenance of upgrades and new versions much easier. The CLIENTVERSION variable allows the retrieval of version information, as follows:

# check_nt -H 192.168.0.210 -p 12489 -s 'QsCrvGnz13#' \     
-v CLIENTVERSION
NSClient++ 0.4.3.143 2015-04-29

Performing checks using NRPE protocol

NSClient++ allows performing a much wider set of checks using the NRPE protocol. Depending on exact modules enabled, specific checks are performed.

To perform queries using the NRPE protocol all that is needed is the check_nrpe plugin from the NRPE package. Installing the plugin is documented in more details in Chapter 10, Monitoring Remote Hosts. The NRPE plugin has to be compiled with the --enable-command-args flag so that it allows sending arguments to the checks. Monitoring machines with NSClient++ installed over NRPE does not require specifying a password.

Many of the check_nt tests have their equivalent options for NRPE. For example, the following is an example of querying CPU load by sending a check_cpu command to NSClient++:

# check_nrpe -H 192.168.0.210 \
 -c check_cpu -a "warn=load>80" "crit=load>90"
OK: CPU load is ok.|'total 5m'=2%;80;90 'total 1m'=5%;80;90 'total 5s'=6%;80;90

warn and crit define criteria for when the check result should be considered a warning or critical result, respectively. In this case, it means that the load has to be above 80% of all CPUs to consider it warning and above 90% of all CPUs to consider it a critical state.

The check and its matching criteria is documented in more detail at http://docs.nsclient.org/reference/windows/CheckSystem.html#check-cpu.

In order to define the commands in Nagios, we’ll need to define the generic check_nrpe command if it was not defined already:

define command{ 
    command_name        check_nrpe 
    command_line        $USER1$/check_nrpe -H $HOSTADDRESS$ 
                        -c $ARG1$ $ARG2$ 
    } 

The service would be as follows:

define service{ 
    host_name           windows210 
    service_description CPU Load 
    check_command       check_nrpe!check_cpu!-a "warn=load>80" 
                        "crit=load>90" 
    } 

Similarly, process and service checks can also be issued by NRPE by using check_process and check_service, as shown here:

# check_nrpe -H 192.168.0.210 \  
-c check_service -a service=fax crit="state='running'"
OK: All 1 service(s) are ok.|'fax'=1;0;4

In this case, we are checking that the service fax is not running.

The preceding commands are provided by the CheckSystem module and are documented in more details at http://docs.nsclient.org/reference/windows/CheckSystem.html. It also provides other multiple checks that can be used over NRPE.

It is also worth noting that the NRPE protocol may be more practical in most cases—as it allows more customization, more complex checks, and retrieves data such as WMI. While historically check_nt based monitoring is popular, using the NRPE for any new monitoring is a better idea.

Querying WMI data from Nagios

It is also possible to use check_nrpe to query the Windows Management Instrumentation (WMI) (http://en.wikipedia.org/wiki/Windows_Management_Instrumentation). It is a mechanism that allows applications to access the system management information using various programming languages.

WMI offers an extensive set of information that can be retrieved. It describes the hardware and operating system as well as the currently-installed applications and the running applications. WMI also uses a query language very similar to the Structured Query Language (SQL) (http://en.wikipedia.org/wiki/SQL) that makes the retrieval of specific information very easy.

The WMI Query Language (WQL) syntax is described in more details in the MSDN documentation available at https://msdn.microsoft.com/en-us/library/aa394606.aspx.

WMI allows accessing classes that provide rows of data, similar to databases, such as the Win32_Process class, which provides information about the currently running processes on Windows. Each row represents a single process and it contains multiple columns that provide specific information, such as ProcessId being the process identifier and Caption being the process name.

All available WMI classes are documented by Microsoft in MSDN at https://msdn.microsoft.com/en-us/library/aa394388.aspx.

The check_wmi command in NSClient++ can be used to perform a check using WMI.

Observe the following example:

# check_nrpe -H 192.168.0.210 \  
-c check_wmi -a "query=Select ProcessId FROM Win32_Process
WHERE Caption='requiredapp.exe'" "crit=count<1"
|'count'=0;0;0

The preceding example checks if requiredapp.exe is the process running using WMI. It runs a query to retrieve rows where the name of the process is requiredapp.exe, which is quoted as required by the WQL syntax (which is also similar to the SQL syntax). The last argument will cause the check to return the CRITICAL status if the number of rows is smaller than 1.

In order to use the query in Nagios, all that is needed is to set up a command service in a manner similar to other NRPE examples.

define service{ 
    host_name           windows210 
    service_description requiredapp is running 
    check_command       check_nrpe!check_wmi!-a 
       "query=Select ProcessId FROM Win32_Process WHERE 
         Caption='requiredapp.exe'" "crit=count<1" 
    } 

Note that the entire value for check_command has to be put in a single line.

WMI can be used for a large variety of tests as its querying syntax is quite powerful. For example, the following check can be used to ensure that there are no directories shared on the local network on Windows:

# check_nrpe -H 192.168.0.210 \  
-c check_wmi -a "query=Select Status FROM Win32_Share
  WHERE Type=0 AND NOT Name LIKE '%$' AND NOT Name='Users'"
  "crit=count>0"
|'count'=0;0;0

This will list all shared directories exported by users. The condition to limit results to rows with TYPE=0 indicates that only directory sharing should be considered. The query also filters out built-in shared directories—those ending with a $ sign and Users share.

The Win32_Share class and fields such as TYPE and NAME are described in more details in the MSDN documentation available at https://msdn.microsoft.com/en-us/library/aa394435.aspx.

The WMI querying module is described in more details in the reference manual available at https://docs.nsclient.org/reference/windows/CheckWMI.html.

Implementing external scripts

Another feature available in NSClient++ is the ability to write custom scripts that will perform checks. This is provided by the CheckExternalScripts module and allows writing specific checks without implementing a dedicated module.

The script can be implemented in any language as the module provides a way to define how scripts with specific extensions should be run. The checks work similar to Nagios checks on Unix systems; the script has to return the message to be provided to Nagios to standard output and exit with one of the following exit codes:

Exit code

Status

Description

0

OK

Working correctly

1

WARNING

Working, but needs attention (for example, low resources)

2

CRITICAL

Not working correctly or requires attention

3

UNKNOWN

Plugin was unable to determine the status for the host or service

The NSClient++ installation comes with a few sample scripts, but they are not enabled by default. Each script has to be explicitly defined in the configuration file.

To add a sample check_no_rdp.bat script that ensures that the remote desktop is not enabled, simply add the following to the nsclient.ini configuration file:

[/settings/external scripts/scripts] 
check_no_rdp_script = scripts\check_no_rdp.bat 

check_no_rdp_script is the name of the command that will be made available. So this is the name to use when querying over NRPE. The name can be the same as script or a different name can be used.

We also need to restart the NSClient++ service, which was described earlier in this chapter.

After that we can perform a check using it:

# check_nrpe -H 192.168.0.210 -c check_no_rdp_script 
RDP not listening! 

The /settings/external scripts/scripts section allows specifying raw commands to run. It is also possible to specify scripts that will be wrapped into valid commands based on their extension by defining them in the /settings/external scripts/wrapped scripts section:

[/settings/external scripts/wrapped scripts] 
check_updates = check_updates.vbs 

This defines a check_updates command that maps to the check_updates.vbs script. We can now test it by running the following:

# check_nrpe -H 192.168.0.210 -c check_updates
OK: There is no critical updates <br />Number of software or driver updates not installed: 1|

The extension determines how the command is run and the /settings/external scripts/wrappings configuration section specifies how to run a command with a specific extension. For example, by default it defines the following ways to run scripts:

[/settings/external scripts/wrappings] 
 
; POWERSHELL WRAPPING - 
ps1 = cmd /c echo scripts\\%SCRIPT% %ARGS%; exit($lastexitcode) | powershell.exe -command - 
 
; BATCH FILE WRAPPING - 
bat = scripts\\%SCRIPT% %ARGS% 
 
; VISUAL BASIC WRAPPING - 
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS% 

This means the actual command in this case will be cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs check_updates.vbs. wrapper.vbs provides common VBS functions for writing Nagios checks that are used by various scripts such as check_updates.vbs.

The default definitions provide ways to run bat, PowerShell, and VBS scripts. It is also possible to register any language. For Node.js, it would be as follows:

[/settings/external scripts/wrappings] 
js = C:\\Program Files\\nodejs\\node.exe scripts\\%SCRIPT% %ARGS% 

This assumes Node.js is installed in C:\Program Files\nodejs, which is the default installation location. Assuming that it is installed, we can now create a sample JavaScript code inside the NSClient++ directory as scripts\check_nodejs.js:

console.log("Hello from Node.js version " + process.versions.node); 
process.exit(0); 

We can now also define the script itself:

[/settings/external scripts/wrapped scripts] 
check_nodejs = check_nodejs.js 

After restarting NSClient++ , we can now test that our script is working properly:

# check_nrpe -H 192.168.0.210 -c check_nodejs 
Hello from Node.js 4.4.4 

In order to run the query from Nagios all that is needed is a very simple service configuration such as the following:

define service{ 
    host_name           windows210 
    service_description Node.js sample service 
    check_command       check_nrpe!check_nodejs 
    } 

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

four × 1 =