Jobs Just Keep on Running…. Running… Into the Future
Setting up a system health check to make sure jobs are running in Control-M.
BMC’s Control-M is a powerful workload automation product that makes it easy for companies of all sizes to get a control on their enterprise-wide scheduling needs. However, like any mission critical product, you need to know when things aren’t quite working as designed. In this short tutorial I’ll show you how to implement a simple system health check process through Control-M and your favorite log monitoring tool.
The scenario is very simple – every 10 minutes a marker file is updated (“touched” for you Unix folks) to indicate that Control-M/Server is still alive and processing jobs. If the file does not get updated within two polling cycles we sent out a notification through our enterprise-wide alerting system – though you could use email, Twitter, whatever you desire!
So let’s get started! First, let’s create the Control-M job:
We’re going to create a very simple embedded script job that simply runs the DOS command “echo monitoring file mark > C:\control_monitor_file.txt “. This basically tells Windows to write “monitoring file mark” to the file located in the root of the C:\ drive. Every time it does this it overwrites anything in the file and updates the file time/date stamp.
We’ll go ahead and set this job up to be a cyclical job that runs every 10 minutes.
That’s it for the Control-M side! Now we’re going to use a log file monitoring tool. For my purposes I use a Patrol Agent and the Log KM. In case you don’t have Patrol you could use any tool that can check if a file has been updated or not. You could write your own in Perl if you wish, see http://hayne.net/MacDev/WatchFile/ for an example to get you started.
Your log monitoring tool can then be configured to send out an alert if the file does not get modified within a certain time frame. We have Patrol setup to tell us to alert us if after 2 checks (10 minutes apart) the file hasn’t been updated through Alarmpoint.
Keep in mind that this solution only verifies that Control-M/Server is still processing jobs. It does not verify that Enterprise Manager components are up and running or that things like New Day are processing correctly.