Home /  Monit /  Postgre SQL

Postgres & Monit

Author: Dave Paper

There are at least two ways to have Monit start a postgres server.

One way is to use the postgres init script that comes with some postgres RPMs:

 check process postgresql with pidfile /opt/postgres/data/postmaster.pid
    group database
    start program = "/etc/init.d/postgresql start"
    stop  program = "/etc/init.d/postgresql stop"
 if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart
 if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert
 if failed host localhost port 5432 protocol pgsql then restart
 if failed host localhost port 5432 protocol pgsql then alert
 if 5 restarts within 5 cycles then timeout

Another way is to make calls directly to pg_ctl as the postgres (or other non-root user):

 check process postgresql with pidfile /opt/postgre//data/postmaster.pid
    group database
    start program = "/opt/postgres/bin/pg_ctl start"
       as uid postgres and gid postgres
    stop  program = "/opt/postgres/bin/pg_ctl stop"
       as uid postgres and gid postgres
 if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart
 if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert
 if failed host localhost port 5432 protocol pgsql then restart
 if failed host localhost port 5432 protocol pgsql then alert
 if 5 restarts within 5 cycles then timeout

In both examples, we are doing belt and suspenders monitoring, checking both the unixsocket and TCP ports for functionality.

Monitoring Messages Generated

Even though Monit starts Postgres using either the postgres UID/GID or calls the postgres init script (which ultimately uses 'su - postgres -c "command"' to start the DB), the tests that Monit run against the processes are run as root.

This ends up generating messages in postgres logs that look similar to this:

ESTrootFATAL:  database "root" does not exist

With the Monit job definitions above, 4 error messages get generated each time Monit runs tests against the postgres servers. This behavior is confirmed on Monit 4.10.1 and Postgres 8.3.4. On the server this behavior was observed on, the tests were exactly 22 seconds apart.

Postgres, barring a specific user argument and database argument presented on the command line, defaults to connect as the UID making the request to a database named the same as the UID. In the tests that Monit runs, that means connecting to a postgres server as root will require the postgres server to have both a root user and a root database. A default postgres install has neither... at least mine don't.

Idealy, the pgsql protocol test would allow a username and a database name to be specified in the monit job. Since that isn't the case, workarounds exist.

There are two ways to solve this problem and clean up the logs.

  1. Change Postgres.
   1. Create DB user `root'.
   2. Create a database 'root' owned by root. It doesn't need to contain any data.
   3. Add these descriptions to pg_hba.conf;

      host   root  root  127.0.0.1/32  trust          <= for test via TCP port
      local  root  root  ident         sameuser       <= for test via UNIX socket

This solution is proposed by the author of the Monit pgsql test. Original post on the subject can be found here.

  1. Change the tests.

This test doesn't provide the same level of monitoring as using the pgsql protocol, but would sufice to tell if the postgres server is accepting TCP connections.

if failed host localhost port 5432 type TCP then restart
if failed host localhost port 5432 type TCP then alert