Duplicating the SCOM Logical Disk Space Monitor

In my previous post about implementing my own addendum management pack for logical disk monitoring in SCOM, I noted that implementing the 2012 addendum led me to a solution I'd been looking for since I started using SCOM as a ticketing tool. Like a lot of customers, I want to get two separate alerts for logical disk space, not just one.

Three State Monitors

The built-in monitor for logical disk space that comes with the SCOM management packs for Windows looks as two values to determine availability.

There is one value for the percentage of free space:

And one for the free megabytes on the disk:

So far so good, and you can also see that we can set separate values for system and non-system disks. There are low and high thresholds set for this monitor because it is a three state monitor, meaning it has a healthy, warning, and critical health state:

For the logical disk monitor, both the percent free space and the free MBytes portion of a threshold have to be breached in order for the health state to change.

So, a lot of engineers and their customers look at this and think, "Great! I can set a warning threshold for a disk and get a lower severity ticket, and then if it breaches the critical threshold I can get a higher severity ticket." Looking at the alert settings, this would appear to be the case, as the default setting is to generate an alert when "The monitor is in a critical or warning health state."

BUT, someone with a bit of experience with SCOM will already know why this doesn't work: a three state monitor only generates one alert when it becomes unhealthy. That means that, with default settings, when the warning state for the monitor is breached, SCOM will generate an alert. That alert can be used by a connector, a notification subscription, Orchestrator, etc, to send an email to a support engineer or create an incident in the ticketing systems. But if the monitor then goes from Warning to Critical, you will not get another alert. The only way for this monitor to generate a new alert is for the monitor to return to a healthy state, which closes the original alert, and then return to an unhealthy state.

Jonathan Alquist covers this issue rather extensively in this blog post. He ultimately suggests sending alerts for critical thresholds and using a view in the Operations Console for engineers to look at and proactively resolve Warning levels before they become critical. In my experience, I have found that in the real world, people don't want to have the SCOM console open and look at it all day long. People want alerts to hit their inbox or pager and a ticket to track against; they want to consume SCOM alerts as incidents.

Alternatives

Since this is my company's use case for SCOM, for better or for worse, that means I'm going to need a second monitor with its own alert. It would be trivial to write a simple performance monitor with a single threshold to look at the free space percent or the free megabytes on the disk, using nothing but the Authoring tab in the console. That, of course, leaves me with the problem that I like how the built-in monitor uses both of these values to create an alert rather than a straight threshold. Solving that is relatively simple, because I could create an Aggregate Rollup monitor as a parent to these other monitors, and set it to trip only if both child monitors move to an unhealthty state, and generate alerts off that rollup.

A good example of what I'm talking about can be found on Dieter Wijckmans' blog, where he applies this approach to cluster disks.

There are a couple drawbacks to this approach, however. One is that there is no differentiation in these simple monitors between system and non-system disks. This is not an issue with cluster disks like Dieter's dealing with, because you can expect that these cluster disks won't ever be system drives. However, if I want to set different thresholds for different disk types, that means I need to be able to discover which is which, which means a disscovery that creates a class and/or group of these disks of one type or the other, and then overrides targeted at these groups and/or classes, etc. What started as a couple simple threshold monitors turns into a big development job.

The other issue is that alerting from an Aggregate Rollup monitor means the alert will not contain the values of the child monitors, meaning that data won't be in the email or ticket that is being generated by the monitor, and if you remember that we started this whole discussion talking about using addendum management packs, and the whole reason anyone wants these in the first place is to keep that data in the body of your alert.

Using the Existing Monitor Type

So, ultimately, what I really want is an exact copy of the existing logical disk monitor to which I can assign separate thresholds and from which I can receive separate alerts. So just for grins, let's load the sealed management pack for Windows 2012 Operating System in the Authoring Console and look at the monitor type:

So there is the monitor type, with its separate percent and megabyte values, and its innate knowledge of system and non-system disks. Unfortunately the accessibility is set to Internal, and the management pack is sealed, so it's not available for use in another management pack.

Using the Addendum Monitor Type

But when I created the 2012 addendum management pack, that means I made my own copy of the data source module and monitor type necessary to create the Logical Disk space monitor.

So in Monitors in the Health Model view, you can see the addendum monitor:

To make a duplicate of this monitor using the monotor type, create a new Custom Unit Monitor:

Select the correct monitor type:

Other than the name, you can make it an exact copy of the other addendum monitor:

Once the duplicate monitor is created, the addendum MP can be imported into SCOM and you should be able to see both monitors:

And now you are free to set overrides for both monitors so that one monitor will alert before the other, so that two separate emails or tickets can be generated.

This approach could still be improved (a pair of two-state monitors would be more elegant than two three-state monitors), but this provides a simple solution for a problem I have had literally since SCOM 2007 was released. I hope you find it useful as well.

comments powered by Disqus