Fix small problems before they turn into larger ones
Every administrator who runs Exchange Server wants to know when something's wrong with it, and many of us would prefer to find out about small problems before they turn into larger, more serious ones. You might think doing so requires a monitoring-and-management software package such as Microsoft Operations Manager (MOM), HP's OpenView, or Quest Software's Spotlight on Exchange. Although these tools certainly provide a great deal of useful functionality, you might be surprised at the monitoring and control functionality built right into your Exchange and Windows servers. You can monitor service and application performance, control services, and get notifications without additional expense. In general, you'll probably find that you can manage a handful of servers by using the built-in tools; the more sophisticated tools start adding major value as you move beyond small numbers of servers to Exchange organizations that consist of multiple servers in multiple locations, or when you have strict service level agreements (SLAs) that place a premium on early notification of problems.
Service Monitoring Basics
The simplest question to answer is also one of the most important: Is my server up or down? Most administrators are accustomed to checking the status of services from within Exchange System Manager (ESM). You can check the state of each individual virtual server by expanding a protocol in the Protocols container beneath each server, then looking for the familiar (and usually unwanted) little red circle. Doing so is a fine first step, but all it tells you is whether the virtual server is active; the underlying service can't be stopped without any indication in ESM other than the red X—so you can't easily tell whether the virtual server instance is stopped or the underlying service has failed or been stopped.
A more authoritative source is the Control Panel Services applet (services.msc). This applet lets you see and change the status of the Exchange services and their prerequisite services. You can right-click Services (Local) and use the Connect to another computer command to connect to another server, which is handy if you want to manage your servers from your desktop or notebook machine. The applet lets you see whether a service is currently running; you can stop and restart services or set their startup type. You can use the Services applet to get further information when ESM tells you that a service isn't running as it should be.
Of course, not everyone wants to use a GUI for every task. You can manage services from the command line by using two methods: the venerable NET commands and the Sc command. You're probably already familiar with NET START and NET STOP; they require the name of the service you want to control (e.g., NET STOP msexchangeIS will stop the Information Store service). Just typing NET START with no arguments will list all running services on the machine you're logged on to, which can be handy when you're trying to verify which services are running. If you try to stop a service on which other services depend (such as the System Attendant), NET STOP will prompt you to confirm your command unless you append the /y switch.
The Sc command is much more flexible, although this flexibility requires some additional work on your part. The first useful thing to know about Sc is that you can use it to target services on another server by including the server name (either as a NetBIOS name or a fully qualified DNS name) prefixed by two backslashes. You can use several command verbs with Sc:
- Sc query tells you the state of the named service (which, we hope, is running), what control commands it's prepared to accept, and its last exit code.
- Sc qc shows you the service name, its start type (automatic, manual, or disabled), the path to the binary image, the display name, the account used to start the service, and its dependent services.
- Sc queryex displays the same information as Sc query, with the addition of the service process's process ID, which is sometimes needed to kill a hung process; you might need this information if the SMTP service hangs when encountering a malformed message.
- Sc start and Sc stop do what you'd expect: They start and stop the named service. You have to stop each individual service separately because a single command can't stop a service and its dependents. Sc also has another difference from the NET commands: The latter don't return control to you until the service has either completed your request or failed to do so, but Sc simply sends the command and returns immediately, displaying a status of START_PENDING or STOP_PENDING.
- Sc enumdepend displays the list of services that depend on the named service. This command is useful in conjunction with Sc stop; you can stop the dependent services first.
The most helpful way to use these commands is probably in a script that runs at intervals you define; the script can use Sc queryex to check the status of key processes, then alert you if something's amiss.
Setting Services for Automatic Restart
These tools are most useful when you want to explicitly stop or start a particular service. Most of the time, however, we really want to monitor a service and kick-start it if it fails. You can set service failure options in the Services applet by using the following steps:
- Launch the Services applet.
- Open the target service's Properties dialog box.
- Switch to the Recovery tab, which Figure 1 shows. By default, the Exchange services are configured to do nothing when they fail. However, you can choose to restart the service, run an external program (which can be a script), or restart the computer. The IIS Admin service, for example, runs iisreset.exe with the /start switch when the service fails; this executable's job is to clean up the service and restart it normally.
- For your Exchange services, set the first failure option to restart the service.
- For the second failure option, your best bet is to use a notification tool to send mail directly to some other account or device that you monitor; for example, you can use the freeware Blat tool to send SMTP mail directly to a Short Message Service (SMS)-enabled mobile phone by using the free Teleflip service. Simply specify the (US-only) mobile number @teleflip.com as the target address (e.g., email@example.com), and the message will automatically be gatewayed to your phone.
ESM's Monitoring and Status Tools
ESM includes the Monitoring and Status node (under the Tools node). I can honestly say that most Exchange administrators I've met have never used these tools, which is too bad. Although they aren't a complete replacement for a package such as MOM, they provide some useful built-in functionality. Let's start with the Status node, which Figure 2 shows. It gives you a summary status of all Exchange connectors and servers in your organization. Connectors are shown as either up or down; if you see a down connector, you'll have to manually check the virtual server that's hosting it and the machine on the remote end because ESM doesn't tell you where the problem lies. The Winroute tool can be useful in this case because it can display the link state routing table, which might tell you where the problem lies.
The other things that the Status node can monitor will probably be of more interest. By default, each server has a status component called Default Microsoft Exchange Services; this component uses Windows Management Instrumentation (WMI) to watch the state of the Information Store, Message Transfer Agent (MTA) stacks, routing engine, System Attendant, SMTP virtual server, and IIS Web services. If any of those services stops, the state of the monitoring item changes to critical, which can trigger notifications in ESM—unfortunately, you have to be watching the Status node to get updates. These services are all checked via WMI, however, which can introduce some reporting latency. For example, when I stop the MTA on one server and restart it within a minute or two, ESM on another server might not catch the outage. So if you use this monitoring item, be sure to press F5 every so often to force ESM to update itself. You can also create your own status monitors that are based on any of the six resource types that ESM supports.
The available virtual memory type lets you set a threshold percentage for virtual memory use. If your server exceeds the threshold and stays above it for the period you specify, updates in the Status node will notify you. You can set both warning and critical limits, which gives you a way to get early warning of resource shortages.
The CPU utilization type lets you monitor overall CPU use for a server by using the same threshold and duration mechanism as the available virtual memory type. You can't monitor the use of individual CPUs in a multiprocessor system, only the aggregate use.
The free disk space type lets you monitor an individual drive and receive an alert if the space available drops below the value you specify. This function is quite useful for transaction-log volumes because you really don't want to run out of space on those disks.
The SMTP queue growth and X.400 queue growth types give you a way to monitor queues for growth. If any queue of the specified type grows steadily for the specified period, you'll be notified.
Use the Windows service status type to monitor arbitrary services. If the services you want to monitor are Exchange-related but aren't included in the Default Microsoft Exchange Services object, add them to that object; otherwise, create your own.
The Notifications node lets you trigger email messages or scripts when a monitoring item enters the critical or warning state. By right-clicking this node, you can choose the New E-Mail Action or New Script Action commands; in turn, those commands display a Properties dialog box, as Figure 3 shows. You can select which server will perform the monitoring, which servers and connectors you want to monitor (one server or connector, all servers or connectors in a routing group, or a custom-specified set of servers or connectors), and what you want the email message to say when a notification occurs. Of course, you have to be pretty comfortable with the idea of sending critical notifications about the health of your email system through your email system before this task is workable. It isn't the best way, for example, to notify you that your server has fallen over dead (unless you use each server to watch its peers; unfortunately, you have to manually set up that task).
Script-based notifications work much the same way. You can use a regular VBScript script or an executable, and when the servers or connectors you're monitoring enter the selected critical or warning state, the script or executable is launched with the command-line options you specify. Doing so is a good way to use a third-party pager or SMS notification tool or to run a script that does something interesting. For example, you can easily write a small script to send a notification to an Ambient Orb or Dashboard device or a Microsoft SPOT smart watch; the script gives you clear notification—not via email—that something's amiss.
Setting a Notification for SMTP Queue Growth
To set up a notification that will alert you via a script when your SMTP queues start to grow beyond their typical limits, use the following steps:
- Launch ESM.
- Open the Properties dialog box for the server you want to monitor, and switch to the Monitoring tab.
- Click the Add button.
- Select SMTP queue growth from the Add Resource dialog box, then click OK.
- In the SMTP Queue Thresholds dialog box, make sure that the Warning state (minutes) and Critical state (minutes) check boxes are selected, then enter the number of minutes of queue growth you want to trigger each state. Click OK when you're done.
- Click OK to dismiss the Properties dialog box.
- Expand the Tools node.
- Expand the Monitoring and Status node, right-click the Notifications folder, and choose the New, Script notification command.
- Click Select to identify the server you want to perform the monitoring.
- If you chose a different server from the one you selected in step 2, make sure that you choose the correct server to monitor.
- Enter the path and command-line options for your script, then click OK to start the monitoring process.
Message tracking might not sound like a monitoring tool, but you can use it as a rough way to measure your servers' health. I sometimes find that I don't get messages when I expect to, either because no one (not even spammers) is sending me mail or because of an actual problem with my Exchange infrastructure. One quick way I can determine the problem is to fire up the Message Tracking Center in ESM's Tools node and perform a quick scan of my SMTP bridgehead server, looking for messages within the suspect time period. In conjunction with checking the logs on my Barracuda Spam Firewall appliance, doing so quickly lets me know whether mail is arriving as usual. If not, the track will often indicate where the message went astray. Combined with the ability to get notifications when a connector seems to be down, this monitoring tool is useful. (For more details, see "In the Know: Message Tracking" InstantDoc ID 47996.)
What Isn't Included
What if you want to monitor or control something that isn't included with ESM? You have two avenues of attack. One is to turn to the WMI interfaces that are included with Windows and Exchange and write your own scripts to monitor what you want. This task is a daunting prospect if you've never scripted before, but it isn't that difficult. Pick up Microsoft's Windows 2000 Scripting Guide and Alain Lissoir's two-volume set (Leveraging WMI Scripting and Understanding WMI Scripting), and you'll have all the raw material you need, plus plenty of sample scripts you can adapt.
In some environments, of course, the extra functionality of tools such as MOM or Spotlight on Exchange is necessary. These tools can integrate and display a range of health and status parameters, and they give you much more sophisticated tools for monitoring server conditions and alerting you when something goes wrong. If you have a complex multiserver environment, or if your monitoring needs are driven by business requirements that force you to have more awareness of your messaging system's condition, these tools are probably a better choice than rolling your own. Having said that, judicious use of ESM's monitoring tools, combined with the performance-monitoring tools built into Windows, will give you much better visibility into the overall health and performance of your Exchange servers, helping you fix small problems before they turn into larger ones.