Home /  Monit /  Websphere Application Server

Websphere Application Server

Author: Lutz Mader

Introduction

Here is an example how to configure a less or more complex IBM Websphere environment.

Monit will restart the JVMs by default, in case the JVMs are stopped on demand Monit will not. This is necessary because JVMs can stopped or started via the Deployment Manager. The Deployment Manager is the main administration process that manages all other application servers in a IBM Websphere environment. Monit and the application servers are running in the user context. Monit can run in a root context, but the application server should not and the used configuration and the used wrapper scripts must modified to handle this.

The application and http server started multiple times on the same system, the Node Agent is started only once. Sometimes the application and the http server started on different systems.

Configuration basis

Some modifications to the ".monitrc" configuration file seems to be useful, but this depend to the used system environment.

set daemon  60              # check services at 60 seconds intervals
    with start delay 240

set logfile /home/wasuser/logs/monit.log

set limits {
    programOutput:     1024 B,    # check program's output truncate limit
    fileContentBuffer: 1024 B,    # limit for file content test
}

set httpd port 2812 and
    use address localhost  # only accept connection from localhost
    allow localhost        # allow localhost to connect to the server and
    allow admin:monit      # require user 'admin' with password 'monit'

include /home/wasuser/monit/config/*.cfg

The application monitor is running in the same user context as the application server. A monitor interval with 60 seconds is fast enough, but depend to the system environment.

Use a unique port if monit is started multiple times on a system and drop the "use address" statement and modify the "allow" statement if you use M/Monit.

Application Server

The application JVMs are monitored by the pid file. To handle the on demand stop and start requests, capture some messages from the Websphere "systemout-1.log" log file and disable or enable the monitoring. The messages are used because some other procedures are used on the one hand and on the other the operators are able to use the Deployment Manager or the standard commands as well.

check file Serv_appl1_Out with path "/opt/IBM/was/apps/appl1/logs/systemout-1.log"
  if not exist then exec "/usr/bin/touch /opt/IBM/was/apps/appl1/logs/systemout-1.log"
#  if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh restart Serv_appl1"
  if match "^.*SRVE0232E: Internal Server Error.*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh restart Serv_appl1"
#  if match "^.*ADMN0015I: The administration service is initialized.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1"
  if match "^.*ADMN1020I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1"
  if match "^.*WSVR0001I: Server .* open for e-business.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1"
#  if match "^.*WSVR0024I: Server .* stopped.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1"
  if match "^.*NMSV0011E: Unable to start .* using port.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1"
  if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1"
# Some alert only messages.
  if match "^.*SRVE0232E: Internal Server Error.*java.lang.NoClassDefFoundError.*" then alert
  if match "^.*SRVE0293E: .* java.lang.OutOfMemoryError.*" then alert
  if match "^.*SRVE8109W: Uncaught exception thrown by filter.*java.lang.OutOfMemoryError.*" then alert
  if match "^.*WSVR0009E: Error occurred during startup.*" then alert
  if match "^.*WSVR0605W: Thread .* has been active for .* milliseconds and may be hung.*" then alert
  if match "^.*WSVR0606W: Thread .* was previously reported to be hung but has completed.*" then alert
#  if match "^.*TCPC0001I: TCP Channel .* is listening on host .* port .*" then alert
  if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then alert
#  if match "^.*CHFW0019I: The Transport Channel Service has started chain .*" then alert
  if match "^.*CHFW0033E: The Transport Channel Service failed to start transport chain .* after .* attempts to start it.*" then alert
#  if match "^.*CHFW0034W: The Transport Channel Service detected transport chain .* failed.*" then alert
  group Websphere

check process Serv_appl1 with pidfile "/opt/IBM/was/logs/appl1/appl1.pid"
  start program "/usr/local/etc/monit/scripts/wasserv.sh start" with timeout 120 seconds
  stop program "/usr/local/etc/monit/scripts/wasserv.sh stop" with timeout 120 seconds
  restart program "/usr/local/etc/monit/scripts/wasserv.sh restart" with timeout 300 seconds
  if not exist for 5 cycles then start
  if 5 restarts within 50 cycles then unmonitor
  group Websphere

Feel free to add an additional port monitoring statement.

  if failed host applhost.local port 9081 then alert

Unfortunately Monit does not support counting messages based on the number of cycles well. Therefore there is no way to handle messages based on the occurrence. The number of cycles, in which the messages occur, is counted only.

  if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1"
  if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" for 3 times within 10 cycles then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1"

The used "wasserv.sh" script is a simple wrapper script to call the default commands used by IBM. Some Monit environment variables are used to find the right environment used by the JVM. This is useful because some JVMs (appl1, appl2, appl3, etc.) are started on the same system.

PRG="$0"
if [ -n "$MONIT_SERVICE" ]; then
  SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
  SRVR=`echo "$MONIT_SERVICE" | cut -f 2 -d '_'`
  PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
then
  SERV=''
  SRVR=`echo "$2" | cut -f 2 -d '_'`
  PROC="$2"
fi
WASDIR="was"
cd /opt/IBM/${WASDIR}/bin;
case "$1" in
  'start')
    ./startServer.sh $SRVR -timeout 60 ;;
  'stop')
    ./stopServer.sh $SRVR -timeout 30 ;;
  'restart')
    if [ "$SERV" = "$PROC" ]; then
      $PRG stop
      sleep 30
      $PRG start
    else
      [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC
    fi  ;;
  'monitor')
    [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;;
  'unmonitor')
    [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;;
  *) ;;
esac

Http Server

The IBM Http Server (a IBM version of the Apache Http Server) is monitored by the pid file and the used port.

check process Ihs_appl1 with pidfile "/opt/IBM/was/apps/appl1/logs/httpd.1.pid"
  start program "/usr/local/etc/monit/scripts/wasihs.sh start" with timeout 120 seconds
  stop program "/usr/local/etc/monit/scripts/wasihs.sh stop" with timeout 120 seconds
  if failed host applhost.local port 8901 then alert
  if not exist for 5 cycles then start
  if 5 restarts within 50 cycles then unmonitor
  group Websphere

The used "wasihs.sh" script is a simple wrapper script to call the default commands used by IBM. Some Monit environment variables are used to find the right environment used by the server.

PRG="$0"
if [ -n "$MONIT_SERVICE" ]; then
  SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
  SRVR=`echo "$MONIT_SERVICE" | cut -f 2 -d '_'`
  PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
then
  SERV=''
  SRVR=`echo "$2" | cut -f 2 -d '_'`
  PROC="$2"
fi
WASDIR="was"
IHSDIR="ihs"
IHSFILE="/opt/IBM/${WASDIR}/apps/${SRVR}/conf/httpd.conf"
cd /opt/IBM/${IHSDIR}/bin;
case "$1" in
  'start')
    ./apachectl -k $1 -f $IHSFILE ;;
  'stop')
    ./apachectl -k $1 -f $IHSFILE ;;
  'restart')
    if [ "$SERV" = "$PROC" ]; then
      $PRG stop
      sleep 30
      $PRG start
    else
      [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC
    fi  ;;
  'monitor')
    [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;;
  'unmonitor')
    [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;;
  *) ;;
esac

Deployment Manager

The Deployment Manager or Network Deployment Manager JVM is monitored by the pid file. The Deployment Manager is an administrative process used to provide a centralised management view and control for all elements in an IBM Websphere environment, including the management.

check file Dmgr_1_Out with path "/opt/IBM/was/logs/dmgr/SystemOut.log"
  if not exist then exec "/usr/bin/touch /opt/IBM/was/logs/dmgr/SystemOut.log"
  if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasdmgr.sh restart Dmgr_1"
  if match "^.*WSVR0009E: Error occurred during startup.*" then alert
  group Websphere

check process Dmgr_1 with pidfile "/opt/IBM/was/logs/dmgr/dmgr.pid"
  start program "/usr/local/etc/monit/scripts/wasdmgr.sh start" with timeout 120 seconds
  stop program "/usr/local/etc/monit/scripts/wasdmgr.sh stop" with timeout 120 seconds
# SOAP_CONNECTOR_ADDRESS Port
  if failed host dmgrhost.local port 8703 then alert
# WC_adminhost_secure Port
  if failed host dmgrhost.local port 8701 then alert
  if not exist for 5 cycles then start
  if 3 restarts within 50 cycles then unmonitor
  group Websphere

Additional port monitoring is used to monitor the "SOAP_CONNECTOR_ADDRESS" and the "WC_adminhost_secure" Port. The used "wasdmgr.sh" script is a simple wrapper script to call the default commands used by IBM.

PRG="$0"
if [ -n "$MONIT_SERVICE" ]; then
  SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
  PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
then
  SERV=''
  PROC="$2"
fi
WASDIR="was"
cd /opt/IBM/${WASDIR}/bin;
case "$1" in
  'start')
    ./startManager.sh -timeout 60 ;;
  'stop')
    ./stopManager.sh -timeout 30 ;;
  'restart')
    if [ "$SERV" = "$PROC" ]; then
      $PRG stop
      sleep 30
      $PRG start
    else
      [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC
    fi  ;;
  'monitor')
    [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;;
  'unmonitor')
    [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;;
  *) ;;
esac

Node Agent

The Node Agend JVM is monitored by the pid file. A Node Agent manages all managed processes in an IBM WebSphere environment on a node by communicating with a Network Deployment Manager to coordinate and synchronize the configuration. The Node Agent performs management operations on behalf of the Network Deployment Manager.

check file Node_1_Out with path "/opt/IBM/was/logs/nodeagent/SystemOut.log"
  if not exist then exec "/usr/bin/touch /opt/IBM/was/logs/nodeagent/SystemOut.log"
  if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh restart Node_1"
  if match "^.*ADMN1020I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh unmonitor Node_1"
  if match "^.*ADMN1021I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh unmonitor Node_1"
  if match "^.*WSVR0001I: Server .* open for e-business.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh monitor Node_1"
  if match "^.*WSVR0009E: Error occurred during startup.*" then alert
  if match "^.*WSWS8511E: The configuration for the .* application module cannot load correctly.*" then alert
  if match "^.*SSLC0008E: Unable to initialize SSL connection.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh restart Node_1"
  group Websphere

check process Node_1 with pidfile "/opt/IBM/was/logs/nodeagent/nodeagent.pid"
  start program "/usr/local/etc/monit/scripts/wasnode.sh start" with timeout 120 seconds
  stop program "/usr/local/etc/monit/scripts/wasnode.sh stop" with timeout 120 seconds
# SOAP_CONNECTOR_ADDRESS Port
  if failed host applhost.local port 8878 with timeout 10 seconds for 20 cycles then restart
  if not exist for 5 cycles then start
  if 3 restarts within 50 cycles then unmonitor
  group Websphere

Additional port monitoring is used to monitor the "SOAP_CONNECTOR_ADDRESS" Port. The used "wasnode.sh" script is a simple wrapper script to call the default commands used by IBM.

PRG="$0"
if [ -n "$MONIT_SERVICE" ]; then
  SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
  PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'`
then
  SERV=''
  PROC="$2"
fi
WASDIR="was"
cd /opt/IBM/${WASDIR}/bin;
case "$1" in
  'start')
    ./startNode.sh -timeout 60 ;;
  'stop')
    ./stopNode.sh -timeout 30 ;;
  'restart')
    if [ "$SERV" = "$PROC" ]; then
      $PRG stop
      sleep 30
      $PRG start
    else
      [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC
    fi  ;;
  'monitor')
    [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;;
  'unmonitor')
    [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;;
  *) ;;
esac

Take notice

The used scripts are simple wrapper scripts to call the standard commands to start and stop the servers or JVMs, applications. All the time the scripts are called with "monitor", "unmonitor" or "restart" the approbate Monit commands are used.

PROC=$MONIT_SERVICE
/usr/local/bin/monit $1 $PROC

The scripts are called with "unmonitor" to disable or "monitor" to enable the monitoring for servers or JVMs to stop or start these. "stop" or "start" are not used, this would confuse the externally running stop or start process or script. Unfortunately Monit handle unavailable applications as failed services, a stopped status is not available. But Monit handled "stop" and "unmonitor" similar and both result in a "not monitored" service, this is the reason "unmonitor" is used. To enable monitoring again "monitor" is used.

Notification

To send notifications have a look to some samples described on https://mmonit.com/wiki/Notification/Notification

To send notifications via Monit itself add an additional statement to the service check process entry above.

  if not exist then exec "/usr/local/etc/monit/scripts/zexec.sh"
     else if succeeded then exec "/usr/local/etc/monit/scripts/zexec.sh"

And change some alert statements to a script.

  if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then alert

  if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then exec "/usr/local/etc/monit/scripts/zexec.sh"

Sending the notifications via M/Monit is recommended and more useful.

Disclaimer

The use of the software takes place on your own risk.

Nobody can be made under any circumstances liable for damages to hardware and software, lost data and others directly or indirectly by the use of the software emerging damages.

If you do not agree with these conditions, you may not use or distribute this software.