Postgres & Monit
Author: Dave Paper
There are at least two ways to have Monit start a postgres server.
One way is to use the postgres init script that comes with some postgres RPMs:
check process postgresql with pidfile /opt/postgres/data/postmaster.pid group database start program = "/etc/init.d/postgresql start" stop program = "/etc/init.d/postgresql stop" if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert if failed host localhost port 5432 protocol pgsql then restart if failed host localhost port 5432 protocol pgsql then alert if 5 restarts within 5 cycles then timeout
Another way is to make calls directly to pg_ctl as the postgres (or other non-root user):
check process postgresql with pidfile /opt/postgre//data/postmaster.pid group database start program = "/opt/postgres/bin/pg_ctl start" as uid postgres and gid postgres stop program = "/opt/postgres/bin/pg_ctl stop" as uid postgres and gid postgres if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then restart if failed unixsocket /tmp/.s.PGSQL.5432 protocol pgsql then alert if failed host localhost port 5432 protocol pgsql then restart if failed host localhost port 5432 protocol pgsql then alert if 5 restarts within 5 cycles then timeout
In both examples, we are doing belt and suspenders monitoring, checking both the unixsocket and TCP ports for functionality.
Monitoring Messages Generated
Even though Monit starts Postgres using either the postgres UID/GID or calls the postgres init script (which ultimately uses 'su - postgres -c "command"' to start the DB), the tests that Monit run against the processes are run as root.
This ends up generating messages in postgres logs that look similar to this:
ESTrootFATAL: database "root" does not exist
With the Monit job definitions above, 4 error messages get generated each time Monit runs tests against the postgres servers. This behavior is confirmed on Monit 4.10.1 and Postgres 8.3.4. On the server this behavior was observed on, the tests were exactly 22 seconds apart.
Postgres, barring a specific user argument and database argument presented on the command line, defaults to connect as the UID making the request to a database named the same as the UID. In the tests that Monit runs, that means connecting to a postgres server as root will require the postgres server to have both a root user and a root database. A default postgres install has neither... at least mine don't.
Idealy, the pgsql protocol test would allow a username and a database name to be specified in the monit job. Since that isn't the case, workarounds exist.
There are two ways to solve this problem and clean up the logs.
- Change Postgres.
1. Create DB user `root'. 2. Create a database 'root' owned by root. It doesn't need to contain any data. 3. Add these descriptions to pg_hba.conf; host root root 127.0.0.1/32 trust <= for test via TCP port local root root ident sameuser <= for test via UNIX socket
This solution is proposed by the author of the Monit pgsql test. Original post on the subject can be found here.
- Change the tests.
This test doesn't provide the same level of monitoring as using the pgsql protocol, but would sufice to tell if the postgres server is accepting TCP connections.
if failed host localhost port 5432 type TCP then restart if failed host localhost port 5432 type TCP then alert