Websphere Application Server
Author: Lutz Mader
Introduction
Here is an example how to configure a less or more complex IBM Websphere environment.
Monit will restart the JVMs by default, in case the JVMs are stopped on demand Monit will not. This is necessary because JVMs can stopped or started via the Deployment Manager. The Deployment Manager is the main administration process that manages all other application servers in a IBM Websphere environment. Monit and the application servers are running in the user context. Monit can run in a root context, but the application server should not and the used configuration and the used wrapper scripts must modified to handle this.
The application and http server started multiple times on the same system, the Node Agent is started only once. Sometimes the application and the http server started on different systems.
Configuration basis
Some modifications to the ".monitrc" configuration file seems to be useful, but this depend to the used system environment.
set daemon 60 # check services at 60 seconds intervals with start delay 240 set logfile /home/wasuser/logs/monit.log set limits { programOutput: 1024 B, # check program's output truncate limit fileContentBuffer: 1024 B, # limit for file content test } set httpd port 2812 and use address localhost # only accept connection from localhost allow localhost # allow localhost to connect to the server and allow admin:monit # require user 'admin' with password 'monit' include /home/wasuser/monit/config/*.cfg
The application monitor is running in the same user context as the application server. A monitor interval with 60 seconds is fast enough, but depend to the system environment.
Use a unique port if monit is started multiple times on a system and drop the "use address" statement and modify the "allow" statement if you use M/Monit.
Application Server
The application JVMs are monitored by the pid file. To handle the on demand stop and start requests, capture some messages from the Websphere "systemout-1.log" log file and disable or enable the monitoring. The messages are used because some other procedures are used on the one hand and on the other the operators are able to use the Deployment Manager or the standard commands as well.
check file Serv_appl1_Out with path "/opt/IBM/was/apps/appl1/logs/systemout-1.log" if not exist then exec "/usr/bin/touch /opt/IBM/was/apps/appl1/logs/systemout-1.log" # if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh restart Serv_appl1" if match "^.*SRVE0232E: Internal Server Error.*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh restart Serv_appl1" # if match "^.*ADMN0015I: The administration service is initialized.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1" if match "^.*ADMN1020I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1" if match "^.*WSVR0001I: Server .* open for e-business.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1" # if match "^.*WSVR0024I: Server .* stopped.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1" if match "^.*NMSV0011E: Unable to start .* using port.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1" if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1" # Some alert only messages. if match "^.*SRVE0232E: Internal Server Error.*java.lang.NoClassDefFoundError.*" then alert if match "^.*SRVE0293E: .* java.lang.OutOfMemoryError.*" then alert if match "^.*SRVE8109W: Uncaught exception thrown by filter.*java.lang.OutOfMemoryError.*" then alert if match "^.*WSVR0009E: Error occurred during startup.*" then alert if match "^.*WSVR0605W: Thread .* has been active for .* milliseconds and may be hung.*" then alert if match "^.*WSVR0606W: Thread .* was previously reported to be hung but has completed.*" then alert # if match "^.*TCPC0001I: TCP Channel .* is listening on host .* port .*" then alert if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then alert # if match "^.*CHFW0019I: The Transport Channel Service has started chain .*" then alert if match "^.*CHFW0033E: The Transport Channel Service failed to start transport chain .* after .* attempts to start it.*" then alert # if match "^.*CHFW0034W: The Transport Channel Service detected transport chain .* failed.*" then alert group Websphere check process Serv_appl1 with pidfile "/opt/IBM/was/logs/appl1/appl1.pid" start program "/usr/local/etc/monit/scripts/wasserv.sh start" with timeout 120 seconds stop program "/usr/local/etc/monit/scripts/wasserv.sh stop" with timeout 120 seconds restart program "/usr/local/etc/monit/scripts/wasserv.sh restart" with timeout 300 seconds if not exist for 5 cycles then start if 5 restarts within 50 cycles then unmonitor group Websphere
Feel free to add an additional port monitoring statement.
if failed host applhost.local port 9081 then alert
Unfortunately Monit does not support counting messages based on the number of cycles well. Therefore there is no way to handle messages based on the occurrence. The number of cycles, in which the messages occur, is counted only.
if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" then exec "/usr/local/etc/monit/scripts/wasserv.sh monitor Serv_appl1" if match "^.*ORBX0390E: Cannot create listener thread.* Address already in use.*" for 3 times within 10 cycles then exec "/usr/local/etc/monit/scripts/wasserv.sh unmonitor Serv_appl1"
The used "wasserv.sh" script is a simple wrapper script to call the default commands used by IBM. Some Monit environment variables are used to find the right environment used by the JVM. This is useful because some JVMs (appl1, appl2, appl3, etc.) are started on the same system.
PRG="$0" if [ -n "$MONIT_SERVICE" ]; then SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` SRVR=`echo "$MONIT_SERVICE" | cut -f 2 -d '_'` PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` then SERV='' SRVR=`echo "$2" | cut -f 2 -d '_'` PROC="$2" fi WASDIR="was" cd /opt/IBM/${WASDIR}/bin; case "$1" in 'start') ./startServer.sh $SRVR -timeout 60 ;; 'stop') ./stopServer.sh $SRVR -timeout 30 ;; 'restart') if [ "$SERV" = "$PROC" ]; then $PRG stop sleep 30 $PRG start else [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC fi ;; 'monitor') [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;; 'unmonitor') [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;; *) ;; esac
Http Server
The IBM Http Server (a IBM version of the Apache Http Server) is monitored by the pid file and the used port.
check process Ihs_appl1 with pidfile "/opt/IBM/was/apps/appl1/logs/httpd.1.pid" start program "/usr/local/etc/monit/scripts/wasihs.sh start" with timeout 120 seconds stop program "/usr/local/etc/monit/scripts/wasihs.sh stop" with timeout 120 seconds if failed host applhost.local port 8901 then alert if not exist for 5 cycles then start if 5 restarts within 50 cycles then unmonitor group Websphere
The used "wasihs.sh" script is a simple wrapper script to call the default commands used by IBM. Some Monit environment variables are used to find the right environment used by the server.
PRG="$0" if [ -n "$MONIT_SERVICE" ]; then SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` SRVR=`echo "$MONIT_SERVICE" | cut -f 2 -d '_'` PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` then SERV='' SRVR=`echo "$2" | cut -f 2 -d '_'` PROC="$2" fi WASDIR="was" IHSDIR="ihs" IHSFILE="/opt/IBM/${WASDIR}/apps/${SRVR}/conf/httpd.conf" cd /opt/IBM/${IHSDIR}/bin; case "$1" in 'start') ./apachectl -k $1 -f $IHSFILE ;; 'stop') ./apachectl -k $1 -f $IHSFILE ;; 'restart') if [ "$SERV" = "$PROC" ]; then $PRG stop sleep 30 $PRG start else [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC fi ;; 'monitor') [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;; 'unmonitor') [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;; *) ;; esac
Deployment Manager
The Deployment Manager or Network Deployment Manager JVM is monitored by the pid file. The Deployment Manager is an administrative process used to provide a centralised management view and control for all elements in an IBM Websphere environment, including the management.
check file Dmgr_1_Out with path "/opt/IBM/was/logs/dmgr/SystemOut.log" if not exist then exec "/usr/bin/touch /opt/IBM/was/logs/dmgr/SystemOut.log" if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasdmgr.sh restart Dmgr_1" if match "^.*WSVR0009E: Error occurred during startup.*" then alert group Websphere check process Dmgr_1 with pidfile "/opt/IBM/was/logs/dmgr/dmgr.pid" start program "/usr/local/etc/monit/scripts/wasdmgr.sh start" with timeout 120 seconds stop program "/usr/local/etc/monit/scripts/wasdmgr.sh stop" with timeout 120 seconds # SOAP_CONNECTOR_ADDRESS Port if failed host dmgrhost.local port 8703 then alert # WC_adminhost_secure Port if failed host dmgrhost.local port 8701 then alert if not exist for 5 cycles then start if 3 restarts within 50 cycles then unmonitor group Websphere
Additional port monitoring is used to monitor the "SOAP_CONNECTOR_ADDRESS" and the "WC_adminhost_secure" Port. The used "wasdmgr.sh" script is a simple wrapper script to call the default commands used by IBM.
PRG="$0" if [ -n "$MONIT_SERVICE" ]; then SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` then SERV='' PROC="$2" fi WASDIR="was" cd /opt/IBM/${WASDIR}/bin; case "$1" in 'start') ./startManager.sh -timeout 60 ;; 'stop') ./stopManager.sh -timeout 30 ;; 'restart') if [ "$SERV" = "$PROC" ]; then $PRG stop sleep 30 $PRG start else [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC fi ;; 'monitor') [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;; 'unmonitor') [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;; *) ;; esac
Node Agent
The Node Agend JVM is monitored by the pid file. A Node Agent manages all managed processes in an IBM WebSphere environment on a node by communicating with a Network Deployment Manager to coordinate and synchronize the configuration. The Node Agent performs management operations on behalf of the Network Deployment Manager.
check file Node_1_Out with path "/opt/IBM/was/logs/nodeagent/SystemOut.log" if not exist then exec "/usr/bin/touch /opt/IBM/was/logs/nodeagent/SystemOut.log" if match "^.*SRVE....E: .*java.lang.OutOfMemoryError.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh restart Node_1" if match "^.*ADMN1020I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh unmonitor Node_1" if match "^.*ADMN1021I: An attempt is made to stop the.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh unmonitor Node_1" if match "^.*WSVR0001I: Server .* open for e-business.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh monitor Node_1" if match "^.*WSVR0009E: Error occurred during startup.*" then alert if match "^.*WSWS8511E: The configuration for the .* application module cannot load correctly.*" then alert if match "^.*SSLC0008E: Unable to initialize SSL connection.*" then exec "/usr/local/etc/monit/scripts/wasnode.sh restart Node_1" group Websphere check process Node_1 with pidfile "/opt/IBM/was/logs/nodeagent/nodeagent.pid" start program "/usr/local/etc/monit/scripts/wasnode.sh start" with timeout 120 seconds stop program "/usr/local/etc/monit/scripts/wasnode.sh stop" with timeout 120 seconds # SOAP_CONNECTOR_ADDRESS Port if failed host applhost.local port 8878 with timeout 10 seconds for 20 cycles then restart if not exist for 5 cycles then start if 3 restarts within 50 cycles then unmonitor group Websphere
Additional port monitoring is used to monitor the "SOAP_CONNECTOR_ADDRESS" Port. The used "wasnode.sh" script is a simple wrapper script to call the default commands used by IBM.
PRG="$0" if [ -n "$MONIT_SERVICE" ]; then SERV=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` PROC=`echo "$MONIT_SERVICE" | cut -f 1-2 -d '_'` then SERV='' PROC="$2" fi WASDIR="was" cd /opt/IBM/${WASDIR}/bin; case "$1" in 'start') ./startNode.sh -timeout 60 ;; 'stop') ./stopNode.sh -timeout 30 ;; 'restart') if [ "$SERV" = "$PROC" ]; then $PRG stop sleep 30 $PRG start else [ -n "$PROC" ] && /usr/local/bin/monit restart $PROC fi ;; 'monitor') [ -n "$PROC" ] && /usr/local/bin/monit monitor $PROC ;; 'unmonitor') [ -n "$PROC" ] && /usr/local/bin/monit unmonitor $PROC ;; *) ;; esac
Take notice
The used scripts are simple wrapper scripts to call the standard commands to start and stop the servers or JVMs, applications. All the time the scripts are called with "monitor", "unmonitor" or "restart" the approbate Monit commands are used.
PROC=$MONIT_SERVICE /usr/local/bin/monit $1 $PROC
The scripts are called with "unmonitor" to disable or "monitor" to enable the monitoring for servers or JVMs to stop or start these. "stop" or "start" are not used, this would confuse the externally running stop or start process or script. Unfortunately Monit handle unavailable applications as failed services, a stopped status is not available. But Monit handled "stop" and "unmonitor" similar and both result in a "not monitored" service, this is the reason "unmonitor" is used. To enable monitoring again "monitor" is used.
Notification
To send notifications have a look to some samples described on https://mmonit.com/wiki/Notification/Notification
To send notifications via Monit itself add an additional statement to the service check process entry above.
if not exist then exec "/usr/local/etc/monit/scripts/zexec.sh" else if succeeded then exec "/usr/local/etc/monit/scripts/zexec.sh"
And change some alert statements to a script.
if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then alert if match "^.*TCPC0003E: TCP Channel .* initialization failed.*" then exec "/usr/local/etc/monit/scripts/zexec.sh"
Sending the notifications via M/Monit is recommended and more useful.
Disclaimer
The use of the software takes place on your own risk.
Nobody can be made under any circumstances liable for damages to hardware and software, lost data and others directly or indirectly by the use of the software emerging damages.
If you do not agree with these conditions, you may not use or distribute this software.