Don't you just love responding to fan mail? I do, especially when the note contains some really pertinent questions. And so we come to a note received from a member of the Exchange PG. He might have said "Tony you're crazy", but he's a kinder individual than that so asked nicely about why I had ignored the Get-ServerHealth and Get-HealthReport cmdlets when discussing how to monitor the health of a DAG. Here's why.
It’s always nice to receive feedback about a post. When I wrote about the influence that the Exchange 2013 Managed Availability framework exerts over Database Availability Groups, I received a note from a member of the Exchange product group to ask why I had recommended using a PowerShell script (Get-DAGHealth.ps1) rather than making any attempt to leverage the Get-HealthReport cmdlet (thoughtfully provided to measure and report on the good health of servers). Furthermore, why not use SCOM as Managed Availability and SCOM work well together in terms of providing administrators with a view onto what might be going wrong within a DAG.
Good questions that deserve an equally good response. I’ve already done so in email but it’s good to share (sharing is caring, as so often said by a purple dinosaur), so here’s how I responded.
First, not everyone uses SCOM. There are many smaller installations that don’t want to spend the extra dollars to buy SCOM. Equally, there are installations that prefer to use their own monitoring systems – bought-in or home-made. This doesn’t mean that SCOM is bad. Just that choice exists.
Second, a PowerShell script is a wonderful thing for an administrator, especially when it’s been written by someone like Paul Cunningham who actually knows what he is doing and can code a bit. Not only does the Get-DAGHealth.ps1 script work, it can be changed and customized to meet the needs of an individual organization, including being integrated into whatever monitoring system is in use.
In addition, the Get-DAGHeath.ps1 script works with Exchange 2010 DAGs, which do not know about Managed Availability (and have no idea of its importance in their future).
Third, I’m an opinionated, grumpy old man who set out along a path and made some recommendations in print. It just goes to prove that you should not assume that everything you read in a blog is the best or most complete advice on the subject.
Get-HealthReport is indeed a most useful cmdlet as is its close companion, Get-ServerHealth. So much so that an interesting exercise for the reader would be to take the Get-DAGHealth.ps1 script and integrate Get-HealthReport or Get-ServerHealth into it (or just wait because Paul Cunningham will probably do this any day now).
Some examples might whet your appetite and plunge into PowerShell coding mode. First, here’s how to use Get-ServerHealth to return information about the components running on a server that impact High Availability:
You might be surprised at the number of probes used by Managed Availability to analyze server health for just one (albeit large) piece of functionality. Now let’s use Get-HealthReport to report the overall health status of a Database Availability Group using rollup data extracted from all the servers in the DAG:
If this command reports that the service is "Degraded", it means that at least one of the member nodes in the DAG is suffering and needs some TLC from the administrator. You'd then need to check each server with a command similar to that below to identify which DAG member is complaining and why they are forcing the overall health set to report a degraded status.
Try these commands out on your Exchange 2013 DAG and discover what information is returned. You’ll find that this data provides an interesting insight into what Managed Availability thinks is happening on DAG member servers. Mor information about how to use these cmdlets to determine what is happening on a server is available in this EHLO post. Or indeed, in the TechNet article "Manage Health Sets and Server Health".
All of which goes to prove that the advent of Managed Availability and the new cmdlets that report on server health mean that you should really pay more attention to what’s going on inside DAGs. Databases activating automatically, moving themselves around at the drop of a hat (or rather, after Active Manager is given a bad health report by Managed Availability). What’s the world coming to?
Follow Tony @12Knocksinna