Tools and techniques to ensure network availability

Maintaining network availability in Windows 2000 is an entirely new ball game for network administrators. To effectively support Win2K networks and maintain the same levels of network availability that your previous Windows networks provided, you must perform network-management activities beyond the steps you've taken with earlier Windows versions. As with any computer network, monitoring crucial statistics such as server CPU, memory and disk utilization, and network connectivity statistics is imperative. However, Win2K introduces additional components, services, and dependencies that you must also monitor regularly.

These new elements, which collectively make up Win2K's core infrastructure, include Active Directory (AD) databases and services, DNS servers, the Global Catalog (GC), and Operation Masters. Win2K and Win2K-centric applications rely heavily on these services and components for proper network operation. Thus, network administrators must be able to guarantee not only these components' general availability but also an acceptable performance baseline. Failure to do so can result in severe, networkwide problems, including slow or failed user logon authorizations, inconsistent data across AD servers, the inability to access crucial applications, and printing problems. To properly maintain your Win2K infrastructure, IT shops' network administrators need specific knowledge about which components you need to monitor as well as which full-featured Win2K-aware monitoring tool is right for your organization.

AD: Win2K's Backbone
Before delving into the specifics of AD, let's review the general terms and concepts related to directory-enabled networks. In a hierarchical structure that makes the information easier to understand and access, a directory (aka a data store) maintains data about objects within a known framework or environment such as a network. These objects include traditional network resources such as user and machine accounts, shared network resources such as shared directories and printers, and resources such as network applications, services, and security policies.

Directory service is a composite term that includes the directory data store as well as the services that make the information within the directory available to users and applications. Directory services come in various types and from different sources. OS directories, such as Microsoft's AD and Novell's Novell Directory Services (NDS), are general-purpose directories that vendors include with a network OS and design to be multipurpose directories that a variety of users, applications, and devices can access. Some applications, such as enterprise resource planning (ERP), human resources (HR), and email systems (e.g., Microsoft Exchange Server) provide directories for storing data specific to their functionality.

Why is a directory essential? A directory provides a central repository for all of an enterprise network's crucial data, including information about user accounts, computers, printers, applications (e.g., an HR database), security, and system configuration policy. Over time, organizations can use a central directory, such as AD, to consolidate the majority of their crucial data into one shared network resource. This consolidation improves organizational efficiency and significantly reduces a network's total cost of ownership (TCO).

Although data centralization and consolidation is a key benefit of directory services, this functionality also represents one of directory services' greatest potential weaknesses. Moving crucial information from a distributed model to one that is highly centralized considerably reduces a network's tolerance for downtime and problems and increases the risk of loss as a result of downtime. Thus, a considerable portion of a network administrator's monitoring efforts needs to be focused on AD and its subcomponents.

In most cases, AD is the compelling feature that is driving enterprise customers toward migrating to Win2K. With AD, Microsoft has finally delivered a directory that can support large and multisite networks. Although plenty of alternative directory products have been on the market for some time (e.g., Banyan's StreetTalk and Novell's NDS), many Microsoft- and Windows NT-centric organizations have chosen to wait and use AD as the foundation for their enterprise networks. As a result, AD represents the first foray into the world of directories and directory management for many organizations and network administrators.

One or more Win2K domain controllers host AD, which the domain controllers replicate in a multimaster fashion to ensure increased availability of the directory and the network. In this replication scenario, multiple read/write copies of the database exist simultaneously. This setup differs from NT 4.0's single-master PDC and BDC replication topology wherein one domain controller, the PDC, houses a read/write copy of the database. In addition to providing a central repository for network objects and services for accessing those objects, AD furnishes security in the form of discretionary access control lists (DACLs). AD applies DACLs to directory objects to prevent unauthorized parties from accessing those objects.

At a physical level, AD uses Microsoft's Extensible Storage Engine (ESE) to store the directory database. Exchange Server also uses ESE. Like Exchange Server, AD's database employs transaction log files to help ensure database integrity in the case of events (e.g., power outages) that interfere with the successful completion of database transactions. AD also shares Exchange Server's ability to perform online database maintenance and defragmentation.

AD is a database, so all your Win2K domain controllers are essentially crucial database servers. Therefore, you should treat your Win2K domain controllers no differently than you treat any other important database server in terms of fault-tolerance preparation (e.g., disk redundancy, backups, power protection) and capacity planning.

Although AD's management interfaces and APIs mask the building blocks that make up the directory, AD's physical configuration is nonetheless an important consideration for Win2K administrators. For example, all volumes on domain controllers that host the AD database and its transaction logs must maintain adequate levels of free disk space at all times. For performance reasons, you must ensure that the AD databases on domain controllers don't become too heavily fragmented. In addition, administrators need to be aware of the services and components that ensure an AD-enabled Win2K network's stability.

DNS: Gateway to AD
The TCP/IP network protocol plays a larger role in Win2K than in earlier NT versions. Although Win2K also supports other legacy protocols, such as IPX and NetBEUI, Microsoft based most of Win2K's internal mechanics, including AD, on TCP/IP. In AD-enabled networks, as in all TCP/IP-based networks, the ability to resolve names to IP addresses is an essential service. A bounded area within which a resolution service can resolve a given name is a namespace. In NT-based networks, NetBIOS is the primary namespace and WINS is the primary name-to-IP address resolution service. In Win2K, Microsoft has abandoned the use of NetBIOS as the primary network namespace and replaced it with DNS. Like AD, DNS employs a hierarchical namespace and uses domains, but DNS defines domains differently than AD does.

Although you can incorporate a DNS namespace into an NT network for name-to-IP address resolution, this use of DNS is optional and mainly of interest to enterprises running heterogeneous environments or Internet-based applications. However, in AD, DNS plays a more crucial role. In addition to replacing NetBIOS as the default name resolution service in Win2K, Microsoft designed Win2K domains to use a DNS-style naming structure that ties the namespace of AD domains directly to the network's DNS namespace. (However, only companies that use separate DNS configurations for the internal LAN and the Internet—the Microsoft-recommended configuration— usually experience this namespace duplication.) Finally, Win2K uses DNS as its default locator service, which is the service that the OS uses to convert items such as AD domain, site, and service names to IP addresses.

Although the DNS and AD namespaces in a Win2K network are identical in regards to domain names, the namespaces are otherwise unique, and Win2K uses them for different purposes. DNS databases contain domains and the record contents (e.g., host IP address and A records, SRV records, mail exchanger—MX—records) of the DNS zone files for those domains. AD contains various objects, including domains, organizational units (OUs), users, computers, and group policy objects.

Another notable connection between DNS and AD is that you can configure Win2K DNS servers to store their DNS domain zone files directly within AD rather than in external text files. Although DNS doesn't rely on AD for DNS's functionality, AD relies on DNS to operate.

Win2K includes an implementation of dynamic DNS (DDNS, which the Internet Engineering Task Force—IETF— Request for Comments—RFC—2136 defines) that lets AD-enabled clients locate important Win2K network resources, such as domain controllers, through special DNS resource records called SRV records. Therefore, SRV records' accuracy is crucial to the proper functionality of a Win2K network and the availability of the systems and services that the records reference. After an AD-enabled client uses a DNS SRV record to locate a domain controller, the client uses the Lightweight Directory Access Protocol (LDAP) to issue queries to the domain controller to resolve directory object names to their corresponding records. For more information about DNS, see Mark Minasi, "A DNS Primer," January 2000.

The Global Catalog
AD is the central component of a Win2K network, so network clients and servers query it frequently. To increase AD data's availability on the network, as well as the efficiency of directory object queries from clients, Win2K includes a service called the Global Catalog. The GC is a separate database from AD that contains an index of commonly queried AD object attributes. You can configure only Win2K domain controllers to be GC servers. Every Win2K forest must contain at least one GC server, and, by default, the first domain controller in a Win2K forest is the GC server. (You can later move this service to a different domain controller.) Like AD, the GC uses replication to update the various GC servers in a Win2K domain or forest. In addition to being a repository of commonly queried AD object attributes, the GC provides network logon authentication and directory searches and queries.

In native-mode Win2K domains (i.e., all domain controllers run Win2K, and an administrator has manually made the native-mode election), the GC facilitates network logons for AD-enabled clients because the GC provides universal group membership information for the account sending the logon request to a domain controller. The GC provides this service not only to regular users but also to every type of object that must authenticate to AD. In multidomain networks, at least one domain controller acting as a GC must be available for users to be able to log on. Another situation that requires a GC server occurs when a user attempts to log on with a user principal name (UPN) other than the default UPN. If a GC server isn't available, users can log on only to the local computer. An exception to this scenario is members of the Domain Administrators group who don't require a GC server to log on to the network.

AD read requests, such as directory searches and queries, tend to outweigh write requests, such as directory updates. The majority of AD-related network traffic on a Win2K network includes requests from users, administrators, and applications about objects in the directory. As a result, the GC is essential to a Win2K infrastructure because the GC lets clients quickly perform searches across all domains within a forest.

Mixed-mode Win2K domains don't require a GC for the network-logon-authentication process. However, GCs are still important for facilitating directory queries and searches on these networks, and you should make at least one available at each site within the network.

The Operation Masters
Although multimaster replication is a central feature of AD and Win2K networks, the potential for collisions and conflict between multiple servers makes this functionality inappropriate for some network operations and roles. To accommodate these special cases, Win2K elects specific machines to serve as Operation Masters (aka Flexible Single Master Operation—FSMO—roles). Each Operation Master is responsible for handling changes to a specific AD area. Five Operation Master roles exist in every Win2K enterprise: schema master, domain naming master, PDC emulator, Relative Identifier (RID) master, and infrastructure master. The schema master and domain naming master are forest-specific roles, and the other three roles are domain-specific. Win2K automatically elects the Operation Master servers during the creation of each AD forest and domain.

Schema master. The Win2K domain controller that plays the schema master role is responsible for all updates and modifications to the forestwide AD schema. The schema defines every type of object and object attribute AD can store. Only members of the Schema Administrators group can modify a forest's schema, and they can make modifications only on the schema master.

Domain naming master. The domain controller elected to the domain naming master role is responsible for changes to AD's forestwide domain namespace. Only this server can add or remove a domain from the directory and add or remove references to domains in external directories.

PDC emulator. If a Win2K domain contains non-AD-enabled clients or is a mixed-mode domain containing NT BDCs, the PDC emulator acts as an NT PDC for these systems. In addition to replicating the NT-compatible portion of directory updates to all BDCs, the PDC emulator is responsible for time synchronization on the network and processing account lockouts and client password changes.

RID master. The RID master allocates sequences of RIDs to each domain controller in the RID master's domain. Whenever a Win2K domain controller creates an object, that object needs a unique SID. A SID consists of a domain SID and a RID. When a domain controller has exhausted its internal pool of RIDs, it requests another pool from the RID master.

Infrastructure master. To reference an object in a different domain, an object uses the globally unique ID (GUID), SID, and distinguished name (DN) of the object being referenced. The infrastructure master is the domain controller responsible for updating an object's SID and DN in a cross-domain object reference. The infrastructure master also handles updates for all interdomain references (e.g., when an administrator renames or changes the members of a group, the infrastructure master updates the group-to-user references). The infrastructure master uses multimaster replication to distribute updates.

The Operation Masters play crucial roles in a Win2K network, so ensuring that the servers hosting these roles are continually available is important. Severe networkwide problems (e.g., if the RID master fails, servers will be unable to allocate RIDs to AD objects) can result in unavailable services and database inconsistencies. For more information about Operation Masters, see Gary Rosenfeld, "Win2K Operation Masters," August 2000.)

Monitoring Win2K-Style
As you can see, an AD-enabled Win2K network includes several new and important infrastructure components that didn't exist in NT-based networks. As a result, ensuring the health and availability of your Win2K systems means that you'll need to account for these additional components in your network-monitoring routine. (Table 1 lists potential problems with components and services that you need to monitor for regularly.) You can employ several management tools and techniques to maintain a healthy and available network.

The primary monitoring consideration in a Win2K environment is AD and its related services and components, including responsiveness to DNS and LDAP queries, AD intersite and intrasite replication, and the Knowledge Consistency Checker (KCC). The health and availability of services such as DNS, the GC server, and DFS are also important.

However, simply knowing what metrics to monitor is only the first step. The most important and complex aspect of monitoring network health and performance isn't determining what to monitor, but how to use the raw data collected from an array of metrics to make useful determinations. For example, you can use Performance Monitor to collect data about AD replication on several dozen metrics, but simply having this information doesn't mean you automatically know how to interpret the data or what tolerance ranges are acceptable for each metric.

Help from the Outside
I highly recommend that to proactively monitor your network, you invest in a full-featured network-monitoring solution. A useful monitoring system not only collects raw data but also understands how to use the information to identify network problems. Look for this kind of artificial intelligence, as well as alerting options, the ability to monitor AD changes, a product architecture that doesn't hinder network performance, the inclusion of a support knowledge base, and SLA awareness, in a network-monitoring software package.

Problem-resolution features. Many third-party products provide automatic problem-resolution features that enable the software to take corrective actions when it detects a specified problem. (For example, you can configure the software to restart a service when the product discovers that the service is unresponsive.) To accomplish these tasks, many tools use scripting or the ability to call external utilities.

Complex network-monitoring software packages base their decisions on rule sets derived from an internal database or intelligent escalation routines that emulate an administrator's actions. For example, you can configure some third-party tools to restart a service the first time that the service fails, restart the computer if restarting the service didn't resolve the problem, then promote another machine to replace the problem system if restarting the system fails to solve the problem. When considering utilities that provide problem-resolution features, look for products that offer a flexible scripting language or the ability to customize and escalate problem-resolution actions.

Alerting options. Good network-monitoring software provides an assortment of alerting options such as console alerts, network pop-up messages, event log entries, email alerts, pager notifications, and SNMP traps. You can even find monitoring software that interfaces with popular network-management packages. (For more information about these monitoring software products, see "Win2K Network-Monitoring Tools.")

Ability to monitor AD changes. In addition to monitoring and troubleshooting Win2K network infrastructure problems, monitoring software provides the ability to monitor and audit changes made within the network—particularly AD changes. In many organizations, dozens or even hundreds of administrators might make daily changes to AD: adding OUs, users, groups, and printers, and defining group policies. To manage the potential chaos this situation presents, your organization needs a monitoring system that can identify recent AD object changes, who made the changes, and when. For example, you might want to track changes to the AD schema, OUs, contacts, computers, and printers, and directory recovery actions (e.g., running a Directory Services Restore Mode operation on a domain controller). If your monitoring software permits it, consider scheduling these reports to run daily, and make a habit of reviewing these reports often so that you'll have a better chance of catching problems early.

Product architecture. An important consideration when selecting a network-monitoring solution for your organization is the product's architecture. Understanding how the product collects data and what impact this information-collection process will have on your network and servers is important. Does the product employ local agents to gather metrics, or does it use remote queries? Are throttling features available to control network bandwidth and system resource usage? Does the product offer a machine/site/domain hierarchy that efficiently passes data to the central collection database? Does the product provide Web-based management? The answers to these questions can have a significant impact on your network environment and your satisfaction with a third-party tool.

Support knowledge base. Another differentiating feature among network-monitoring software packages is whether the software provides a support knowledge base of common problems and solutions. This information is invaluable from a technical and financial standpoint because it reduces the learning curve of IT staff and the amount of time and money administrators must expend researching and solving problems. Some utilities augment this feature by letting administrators add data to the software's knowledge base, leveraging the IT staff expertise and creating a comprehensive problem-resolution system.

SLA awareness. IT shops that have service level agreements (SLAs) with clients or a parent organization might consider a network-monitoring product that is SLA-aware. This functionality lets the software generate alerts and reports that address exceptions to or compliance with SLA obligations.

Tools of the Trade
Understanding Win2K networks' crucial elements, as well as the features you should look for in monitoring software, lets you make an informed purchasing decision. An overwhelming number of tools that provide network-monitoring features are on the market; however, I've encountered only a handful that support most of the previously mentioned features. For more information about these tools, see "Win2K Network-Monitoring Tools."

If you don't have the budget to purchase a third-party monitoring tool, you're not totally out of luck. Win2K, the free Windows 2000 Support Tools (available on all versions of Win2K or the Microsoft Web site), and the Win2K resource kits provide several free utilities that you can use to assemble a decent network-monitoring system. (Table 2 lists these utilities and their locations.)

If you're enterprising and willing to spend time scripting, you can write administrative scripts that supplement Win2K's utilities' functionality by automating them (e.g., schedule jobs that regularly poll services and machines, make intelligent service queries, and evaluate server responses). With enough elbow grease, you might be able to simulate a rudimentary approximation of some of the features found in the higher-end products.

Sharpen Your Monitoring Skills
With the introduction of the sophisticated AD service, Win2K represents a quantum leap forward in the evolution of NT networks. Win2K also introduces a new level of network infrastructure complexity that requires new knowledge and tools for proper management. To reduce the administrative burdens associated with managing a Win2K network, consider employing network-monitoring software that provides features specific to the needs of a Win2K environment. These tools provide early warning indicators that mitigate the risk of loss associated with network downtime, and offer intelligent knowledge bases that augment and leverage the knowledge of the organization's existing IT staff.

WIN2K NETWORK-MONITORING TOOLS
AppManager Suite
OnePoint Operations Manager
NetIQ * 408-330-7000
http://www.netiq.com/
DirectoryAnalyzer
NetPro * 480-941-3600
http://www.netpro.com/
OpenView
OpenView Express
OpenView ManageX
Hewlett-Packard * 408-246-4300
http://www.openview.hp.com/
Unicenter TNG
NetworkIT
Computer Associates * 800-645-3042
http://www.cai.com/