Collect and analyze diagnostic information

A full server disk once blindsided me. This unexpected crisis was just one of many that I could have prevented if I had found the time to review the trend information for my servers and network. The immense task of managing increasingly complex networks has IT staff constantly putting out fires rather than relying on proactive strategies to avert problems. The larger the organization, the more pronounced this problem becomes. One of the main factors contributing to this problem is the amount of untapped diagnostic and event information available from the network's hardware and software. NetIQ's Operations Manager 3.2 offers a solution to help administrators manage and act on this pool of information.

Operations Manager is an administrative tool for collecting crucial diagnostic data, storing that information in a database, and generating automated rules-based responses to the content. Operations Manager achieves this functionality through several scalable Windows Distributed interNet Applications (DNA) architecture-based components. Figure 1 shows how the Operations Manager components relate to Windows DNA.

Operations Manager Components
At the heart of Operations Manager are three components that operate in the business-logic layer of Windows DNA: the agent, Consolidator, and Data Access Server (DAS). The agent is a service that you install on the Windows 2000 and Windows NT servers and workstations that you want to monitor. This component's main responsibility is to collect information and send it to Consolidator to process. After you install the agent, you can configure it to collect data from Win2K and NT event logs, Performance Monitor counters, and third-party application logs. This component can also generate SNMP traps and other generic events that it sends to Consolidator. Each agent sends a heartbeat signal across the network to keep Consolidator apprised of the agent's status. Finally, the agent applies processing rules for the local machine it's running on. For example, the agent might apply a rule that generates an alert when the machine exceeds a performance threshold.

Consolidator collects information that the agents supply. This component also runs as a service, and you might need to install it on more than one machine, depending on the size and configuration of your network. Consolidator acts as an agent on the machine you install it on but most often acts as an Agent Manager. As an Agent Manager, Consolidator discovers agent machines or potential agent machines (according to parameters you set) and installing or uninstalling the agent service. Consolidator's main tasks are to collect information from the agents and apply processing rules (i.e., run a script, send an email message or page to an administrator, and pass information to DAS).

DAS is the caretaker of the Microsoft SQL Server database that Operations Manager uses to store collected information, rules, and configuration data. As such, this component handles the database I/O traffic to and from Consolidator. The vendor built DAS on COM components and designed DAS to use Microsoft Transaction Server (MTS) to provide centralized database access and query logic. DAS acts as a link between the database, Consolidator, and the administrative interfaces, and as the liaison to the data layer (the SQL Server database) and the presentation layer (the administrative interface).

At the presentation layer, you can use Microsoft Management Console (MMC) to manage all aspects of the program. Through the Web console, Operations Manager also provides browser- based access that works with Microsoft IIS. You can use this interface only to monitor network information, but you can quickly and easily access HTML-based reports that you can schedule the software to publish automatically.

The components work together to channel information from your servers to a centralized reservoir in the database. At any point, administrators can exploit the collected information to troubleshoot, conduct automated problem resolution, trigger alerts, analyze trends, and perform capacity planning.

Operations Manager's components are scalable and provide deployment flexibility. Consolidator, DAS, and the SQL Server database can reside on one machine or on separate machines. Your network size and available hardware will determine the deployment. In a larger environment, you face additional choices. One database and any number of agents, Consolidators, and DASs anchor Operations Manager's fundamental administrative grouping, which the company calls a configuration group. You might want to create several configuration groups depending on your network topology, the number of remote sites that connect across slow links, and your organization's divisions (e.g., finance, engineering). To divide your enterprise along administrative boundaries, you might have one configuration group in charge of messaging (e.g., monitoring the Microsoft Exchange Server systems) and another configuration group in charge of security.

Ultimately, your deployment decision must consider three factors. First, you need to consider the amount of network traffic that Operations Manager generates. In my tests, this traffic wasn't excessive, but having slow links and many agents can cause bottlenecks.

Second, recognize how collected information affects the database's disk space. The product's technical support said agents on 25 servers with security auditing turned on can produce as much as 2GB of data per day. Although Operations Manager has utilities to prune the database, you still need to plan and size your disk arrays appropriately.

Third, Operations Manager occupies a considerable chunk of system memory. During my product testing, the Operations Manager process alone consumed 40MB of system memory. When the software ran with other processes such as SQL Server, my Dell Precision WorkStation with 400MHz dual-Pentium II Xeon processors and 128MB of RAM acted very sluggish. The product's documentation specifies the minimum Operations Manager hardware as a 200MHz Pentium processor with 128MB of RAM and 800MB of free disk space, but my experience showed that you need much more horsepower, even in a small environment. To provide a significant performance boost, technical support suggested a minimum of 512MB of RAM, and for larger environments, 1GB of RAM. The bottom line is to expect to spend some money on capable hardware to properly scale and run Operations Manager.

Deploying Operations Manager
The Operations Manager software ships on one CD-ROM along with three small printed manuals. The documentation is easy to read and well organized but is light on information about advanced topics. For advanced information, NetIQ provides online manuals on its Web site. However, the site didn't have the level of detail I expected.

The vendor claims that administrators can install Operations Manager in minutes, configure it in hours, and deploy it across a large network in a few days. I agree with the estimates on configuration and deployment, but not installation. Although the manuals sufficiently document the installation process and the process isn't difficult, setting up the requisite software took a few hours. To set up one NT server to host all the components, I needed to install MTS, IIS, MMC, SQL Server 7.0 (or SQL Server 6.5), and Microsoft Data Access Components (MDAC). To view and run reports on the local machine, I also installed Microsoft Word and Microsoft Access. A preinstallation checklist that runs when you launch omsetup.exe checks to make sure you meet the prerequisites. After I installed the required software, I was ready to install Operations Manager.

Overall, the installation went smoothly and took only a few minutes. I then created accounts for Consolidator and DAS. Consolidator required a domain account with membership to the local administrators group on each agent and Consolidator machine. To ease setup in my test environment, I added this account to the domain administrators global group. The DAS account required membership in only the local administrators group on the DAS system. Finally, I stepped through other setup items, such as defining the SQL Server database size and naming my configuration group.

Next, I needed to install agents on the five other NT servers on my test network. I double-clicked the MMC shortcut to launch MMC with the Operations Manager MMC snap-in and explored the interface, which Figure 2 shows. After accessing the online Help, I navigated down the console tree, selected Agent Manager, right-clicked it, and opened its Properties sheet. From the Properties sheet, I added a wildcard operator to the Managed Computers tab instructing Operations Manager to immediately install agents to all the computers in my test domain. By default, Operations Manager scans for eligible computers to install the agent on daily at 2:05 a.m.

Deploying the agent software to my five servers was quick, but I ran into a minor glitch. I began receiving alert messages in the console that said I needed to reboot the servers to finish the agent installation. I restarted all the servers but received no communication between the newly installed agents and Consolidator and DAS. I found that the agent service's default configuration set it to start manually. The vendor acknowledged that the manual setting is a bug in Operations Manager 3.2 and plans to fix it soon. I reconfigured the agents to start automatically on all the servers and started the service. After the agents installed, all the components were in place. I then began to learn and configure the many elements of Operations Manager.

Operations Manager has a myriad of configurable options and offers ample leeway for administrators who want to customize it to meet their environment's needs. These options provide power and flexibility, but they increase the product's overall complexity and learning curve. I needed several hours to become familiar with the many configuration items, but after a couple of days of exploring the interface, reading manuals, and accessing the online Help, I had the product functioning on a basic level and was moving into the advanced features.

The Operations Manager MMC Snap-in
The Operations Manager MMC snap-in contains three main sections: monitor, rules, and configuration. The left pane shows the items available in the console tree. The right pane gives you a summary view of your network and the Operations Manager configuration. The monitor section of the console tree gives you several ways to view the status of agents and Consolidators on your network and the information that these components send to the database. You can view all event data, including events that triggered alerts. The rules branch gives you access to the many processing rules that are the heart of Operations Manager's functioning. Processing Rules let you determine what information to collect from which machines, how to process the information, and what action Operations Manager should take on particular events. The configuration view lets you configure settings that affect all Operations Manager components. For example, you can set database grooming options that determine information-retention periods. From the configuration view, you also control when and how Agent Manager deploys agents on your network.

Putting Operations Manager to the Test
I tested Operations Manager's ability to collect event information from the servers on my test network and respond appropriately to certain events. To begin the test, I turned on security auditing for my NT domain. Then, I opened the Operations Manager MMC snap-in, selected the Event Processing Rule Properties sheet, and created an event rule to trigger an alert and send me an email message when the program logged a share access failure audit on a particular server, as Figure 3 shows. I added this rule to an existing rule group that Operations Manager supplies as a general template for NT OSs. I set up a Messaging API (MAPI) account on the server to enable the email feature and verify the rule group's assignment to a computer group. Operations Manager also supplied several generic computer group templates that recognize particular computers according to Registry data that the installed agent supplies. For example, WINS servers, DNS servers, Win2K servers, and Exchange Server systems have default groups. Viewing the processing rules assigned to the computer groups is somewhat difficult because the window lacks sufficient space to display the full description fields.

After I created and assigned the rule, I used another client machine to attempt to access a share on the server I was monitoring. When the program logged a failure audit on the server, the agent recognized the new processing rule and passed the alert upstream to Consolidator, which sent me an email message about the failure audit. In the Operations Manager MMC snap-in interface, I saw that the event added a yellow warning triangle next to the server I had tried to access.

I tried several variations of rules and responses to check Operations Manager's functionality. Instead of sending email messages, I set rules to launch a handful of script files that Operations Manager includes. By modifying a script, I was able to configure Operations Manager to automatically restart my Exchange Server system's Internet Mail Connector (IMC) after I manually shut it down. I collected and viewed performance data on the servers in my domain. Finally, I shut down some of the servers to view Operations Manager's response to the loss of a heartbeat signal. A red down-arrow showed that Operations Manager was unable to establish communication with those servers.

I found that Operations Manager performed as advertised. This tool is very useful if you're looking for a way to cope with the task of monitoring servers and workstations in your organization. You might dedicate a large amount of resources in the initial setup and configuration of this product, but the investment should pay off in your increased capability to troubleshoot, quickly resolve problems, and analyze trends.

Operations Manager 3.2
Contact: NetIQ * 408-330-7000
Price: $495 per managed server
Decision Summary:
Pros: Is rich in features; provides centralized database of event and trend information; provides automated deployment; is highly scalable; has flexible configuration choices
Cons: Has demanding hardware requirements, especially in larger environments; needs greater detail in the documentation about advanced topics; requires a well-trained administrator to deploy and operate the software; presents a steep learning curve