Monitoring database systems using Nagios

Databases allow the storage of information that is used often by entire departments or whole companies. Because most systems usually depend on one or more databases, a failure in these databases can cause all of the underlying systems to go down as well. Imagine a business critical database failure that went unnoticed over a weekend, making both the company’s website as well as e-mail unavailable. That would be a disaster! A series of scheduled reports that was supposed to be sent out would fail to be generated because of this.

This is why, making sure that databases are working correctly and have enough resources to operate might be essential for many companies. Many enterprise-class databases also have table space capacity management, which should also be monitored—even though a valid user may be able to log in, this does not necessarily mean that a database is up and running correctly.

Checking MySQL

One of the most commonly used database types is MySQL. It is very often used to provide a basic database for PHP-based web applications. It is also commonly used as a database system for client-server applications. Nagios offers two plugins to verify if MySQL is working properly. One of the plugins allows checking of connectivity to the database and checking master-slave replication status. The other one allows the measurement of the time taken to execute a SQL query. The syntax of both commands and the definition of their options are as follows:

check_mysql [-d database] [-H host] [-P port] [-s socket] 
            [-u user] [-p password] [-S] [-l] [-a cert] [-k key] 
            [-C ca-cert] [-D ca-dir] [-L ciphers] [-f optfile] 
            [-g group] 
 
check_mysql_query -q SQL_query [-w <warn>] [-c <crit>] 
                  [-d database] [-H host] [-P port] [-s socket] 
                  [-u user] [-p password] [-f optfile] [-g group]

Option

Description

-s, --socket

Unix socket to use for connection, used if -H option was not specified; does not need to be customized in most cases

-P, --port

The port to use for connections (defaults to 3306)

-d, --database

The database to which an attempt to connect is to be made

-u, --username

Username to log in with

-p, --password

Password to log in with

-S, --check-slave

(check_mysql only) Verify that the slave thread is running; this is used for monitoring replicated databases

-w, --warning

Specifies the warning threshold; dependent on the plugin used

-c, --critical

Specifies the critical threshold; dependent on the plugin used

-q, --query

(check_mysql_query only) Query to perform

For the check_mysql_query command, the -w and -c options specify the limits for the execution time of the specified SQL query. This allows us to make sure that database performance is within acceptable limits.

The definitions of the check commands for both a simple test and running an SQL query within a specified time are as follows:

  define command 
  { 
    command_name  check_mysql 
    command_line  $USER1$/check_mysql -H $HOSTADDRESS$ -u 
       $ARG1$ -p $ARG2$ -d $ARG3$ -S -w 10 -c 30 
  } 
  define command 
  { 
    command_name  check_mysql_query 
    command_line  $USER1$/check_mysql_query -H $HOSTADDRESS$ -u
       $ARG1$ -p $ARG2$ -d $ARG3$ -q $ARG4$ -w $ARG5$ -c $ARG6$ 
  } 

Both examples need username, password, and dbname as arguments. The second example also requires a SQL query and the warning and critical time limits.

If the -S option is specified, the plugin will also check whether the replication of MySQL databases is working correctly. This check should be run on MySQL slave servers to make sure that the replication with the master server is in place. Monitoring the number of seconds by which the slave server is behind the master server can be done using the -w and -c flags. In this case, if the slave server is more than the specified number of seconds behind the master server in the replication process, a warning or critical status is issued. More information about checking the replication status can be found under the MySQL documentation for the SHOWSLAVESTATUS command (refer to http://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html).

Checking PostgreSQL

PostgreSQL is another open source database that is commonly used in hosting companies. It is also used very often for client-server applications. The Nagios plugins package offers a command to check if the PostgreSQL database is working correctly. Its syntax is quite similar to the MySQL command:

check_pgsql [-H <host>] [-P <port>] [-c <critical time>] 
            [-w <warning time>] [-t <timeout>] [-d <database>] 
            [-l <logname>] [-p <password>] [-q <query>] 
            [-C <critical query range>] [-W <warning query range>] 

The following table describes the options that this plugin accepts:

Option

Description

-P, --port

The port to use for connections (defaults to 5432)

-d, --database

The database to attempt to connect to

-l, --logname

The username to log in with

-p, --password

The password to log in with

A sample check command that expects username, password, and database name as arguments is as follows:

  define command 
  { 
    command_name  check_pgsql 
    command_line  $USER1$/check_pgsql -H $HOSTADDRESS$ -l $ARG1$  
       -p $ARG2$ -d $ARG3$ 
  } 

Checking Oracle

Oracle is a popular enterprise-level database server. It is mainly used by medium- and large-sized companies for business critical applications. Therefore, a failure or even a lack of disk space for a single database might cause huge problems for a company. Fortunately, a plugin exists to verify various aspects of the Oracle database. It even offers the ability to monitor tablespace storage and cache usage. The syntax is quite different from most Nagios plugins as the first argument specifies the mode in which the check should be carried out, and the remaining parameters are dependent on the first one. The syntax is as follows:

check_oracle --tns <SID> 
             --db <SID> 
             --oranames <Hostname> 
             --login <SID> 
             --cache <SID> <USER> <PASS> <CRITICAL> <WARNING> 
             --tablespace < SID> <USER> <PASS>                           <TABLESPACE> <CRITICAL> <WARNING>

For all checks, Oracle System Identifier (SID) can be specified in the form of <ip> or <ip>/<database>. Because the plugin automatically adds the username and password to the identifier, an SID in the form of <username>[/<password>]@<ip>[/<database>] should not be specified, and in many cases, will not work.

The --tns option checks if a database is listening for a connection based on the tnsping command. This can be used as a basic check of both local and remote databases.

Verifying that a local database is running can be done using the --db option, in which case, a check is performed by running the Oracle process for a specified database.

Verifying a remote Oracle Names server can be done using the --oranames mode.

In order to verify if a database is working properly, a --login option can be used—this tries to log in using an invalid username and verifies if the ORA-01017 error is received, in which case, the database is behaving correctly.

Verifying cache usage can be done using the --cache option, in which case, the cache hit ratio is checked. If it is lower than the specified warning or critical limits, the respective status is returned. This allows the monitoring of bottlenecks within the database caching mechanism.

Similarly, for tablespace checking, a --tablespace option is provided, in which a check is carried out against the available storage for the specified tablespace. If it is lower than the specified limits, a warning or critical status is returned (as appropriate).

This plugin requires various Oracle commands to be in the binary path (the PATH environment variable). Therefore, it is necessary to have either the entire Oracle installation or the Oracle client installation done on the machine that will perform the checks for the Oracle database. Sample definitions to check the login into the Oracle database and the database cache are as follows:

  define command 
  { 
    command_name  check_oracle_login 
    command_line  $USER1$/check_oracle --login $HOSTADDRESS$ 
  } 
  define command   { 
    command_name  check_oracle_tablespace 
    command_line  $USER1$/check_oracle --cache  
       $HOSTADDRESS$/$ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ 
  } 

The second example requires the passing of the database name, username, password, and critical/warning limits for the cache hit ratio. The critical value should be lower than the warning value.

Checking other databases

Even though Nagios supports verification of some common databases, there are a lot of commonly used databases for which the standard nagios-plugins package does not provide a plugin. For these databases, the first thing worth checking is the Nagios Exchange (refer to http://exchange.nagios.org/), as this has a category for database check plugins with commands for checking various types of databases (such as DB2, Ingres, Firebird, MS SQL, and Sybase).

In some cases, it might be sufficient to use the check_tcp plugin to verify whether a database server is up and running. In other cases, it might be possible to use a dynamic language (such as Python, Perl, or Tcl) to write a small script that connects to your database and performs basic tests. Refer to Chapter 13, Programming Nagios, for more information on writing Nagios check plugins.

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

five + six =