In a world of 99.999% up time, keeping a service running is a big deal. How do you compete? That is where monitoring and automated server management comes into play.
It is a good idea to use both a local and remote monitoring solutions, with the remote service being a fail-safe that will send out a notification when a website or service is unreachable or has poor latency.
With remote monitoring, there are many options that will scale to different needs. For example, I use 24×7 by Zoho for basic port monitoring. This service will send a notification if an app is no longer reachable from the internet. There are many monitoring services out there, so it would be worthwhile to search around and compare.
The next step is something that runs locally, has more granular monitoring, and will take actions to resolve a problem when it is detected. Monnit will do just this. It is a daemon that runs on the server and monitors resources and processes. It has the ability to restart programs and send notifications under specific conditions, such as when memory or CPU consumption exceeds a given threshold, and low disk space. It can also detect a continually failing applications by tracking the PID.
Here are some configuration examples of monnit for Apache, MySQL, and SOLR. Comments have been added to describe what they do. Each example uses an alert directive, which requires a recipient to be configured. This is done by setting the following in the config file:
set alert firstname.lastname@example.org
## Custom Apache2 setup check process apache2 with pidfile /var/run/apache2.pid group www start program = "/etc/init.d/apache2 start" stop program = "/etc/init.d/apache2 stop" # Send alert if Apache isn't listening to specified port if failed host localhost port 80 then alert # Restart daemon if children processes > 250 if children > 250 then restart # Alert if load avg stays high with given criteria if loadavg(5min) greater than 80 for 8 cycles then alert # Stop trying to restart daemon if restarts aren't working if 3 restarts within 5 cycles then timeout
## Custom MySQLD setup check process mysqld with pidfile /var/run/mysqld/mysqld.pid group root start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" # Send alert if MYSQLD isn't listening to specified port if failed host localhost port 3306 then alert
## SOLR Check check process solr with pidfile /var/run/solr.pid group root start program = "/etc/init.d/solr start" stop program = "/etc/init.d/solr stop" # Send alert if SOLR isn't listening to specified port if failed host localhost port 8983 then alert # Restart daemon if SOLR isn't listening to specified port if failed host localhost port 8983 then restart # Stop trying to restart if restarts aren't working if 5 restart within 5 cycles then timeout
In each of these, Monnit has at least a check that the app is listening on a designated port. If it is not, a restart of the service is attempted. With Apache, if it is running too many children, the service will be restarted to fix this. (Note: Apache does have a setting in Apache conf that set max children threads that should help avoid triggering the children processes check) In some cases the service will be shutdown if it is running hot for too long.
Keeping a daemon running and gathering information about it before something goes wrong is crucial in maintaining a quality application or service. Monitoring tools like 24×7 and Monnit make this easier and are a must on any IT toolbelt.