Clustering has been a feature of Windows Server since the Windows NT 4.0 days, and the technology has evolved with the OS. In this month's survey on Microsoft Clustering Services (MCS), although 63 percent of respondents said they understand the benefits of clustering, only 30 percent are using it. Of the remaining respondents, 32 percent said they are planning to implement it. The main concerns readers raised in this survey were difficulties with clustering in a storage environment and ease of cluster configuration and management. Readers want to know whether Microsoft is going to make these challenges any easier to deal with, and if so, when and how. Several of the 792 respondents asked questions along the lines of "What improvements in clustering technology are coming within the next 24 months?" and "What is the schedule for enhancements?"

Although Microsoft has a policy of not commenting on future directions and doesn't provide roadmaps, Microsoft's Kurt Friedrich, product unit manager for clustering, and Ryan Rands, senior product manager for enterprise abilities, gave me a glimpse of some intriguing new tools and functionality in the areas of storage and ease of configuration and management. (You can listen to our conversation at http://www.windowsitpro.com, InstantDoc ID 44689. For Ryan's explanation of the various types of clustering Microsoft supports, see the Web-exclusive sidebar "Clustering Technology Overview," InstantDoc ID 44727. And for Kurt's and Ryan's comments on another common concern raised in the survey, see the Web-exclusive sidebar "The Challenges of Clustering with Exchange Server," InstantDoc ID 44728.)

Clustering and Storage
Many readers commented about difficulties with clustering and storage. "We have lots of good news in this space," Kurt responded. "In the past, with third-party Fibre Channel storage solutions, there was very poor standardization other than with the base protocols. The installation, management, and servicing of LUNs and such was proprietary to each third-party storage solution. So when MCS tried to install a cluster, it couldn't even configure the Storage Area Network (SAN) or zone the disk. You had to go back and forth between the Microsoft software and the vendor software."

To address these problems in Windows Server 2003, Kurt continued, "Microsoft has introduced the Virtual Disk Service (VDS) structure, and major storage vendors are now supplying software that matches our VDS side and maps to our standard interfaces. Thus, you can now get such information as what LUNs you have, and you can partition a LUN, take a LUN offline, zone a LUN, etc. All those commands are callable through standard Microsoft software, so we can now automatically detect and configure the storage."

VDS "allows two-way communication with the storage, which wasn't possible in the past, and lets MCS control things in the storage," Ryan elaborated. "But VDS also lets the storage update the OS, which reduces complexity. Instead of having different tools to manage your OS, switching fabric, and storage, VDS can integrate all those tools. You can control the cluster as a unit from one console rather than as individual components."

Kurt said the situation improved with Internet SCSI (iSCSI) "because we had the wisdom of hindsight on the mistakes with Fibre Channel. The iSCSI standards were written to ensure that everything we need for clustering is part of the iSCSI spec. So any device that passes the iSCSI logo has all the new things, like Microsoft's Multipath I/O (MPIO). Anything that's standard will work on any vendor's iSCSI, improving cluster management with storage."

I asked Kurt to explain more about MPIO. "When people install clusters," he replied, "they want redundant everything. So instead of having one host bus adapter (HBA) to connect to the storage box, each server has two, providing two paths to the storage so you don't even need a failover if there's an error."

What clustering problem does MPIO solve? Lack of MPIO "was probably the number one cause of generic cluster problems," Kurt responded. "In the past, storage vendors provided their own plug-in driver stack into Microsoft's stack. That was problematic. For instance, we'd introduce a patch and break a vendor's code, or a vendor hadn't tested all the configurations that we ship. To solve this, we've shipped a Microsoft-standard MPIO driver, and the major storage vendors are now offering providers that use that infrastructure. We've dramatically reduced reliability problems."

Cluster Configuration and Management
The difficulty of configuring and managing clusters was a frequent theme in the survey results. One person asked, "Do you have plans to build an easy configuration tool (wizard, GUI)? Do you have any plan to improve the monitoring tools for MCS?"

The complexity of configuration depends on which version of Windows you're using, Kurt explained. "As we progressed from Windows NT 4.0 to Windows 2000 to Windows Server 2003, the installation has become tremendously simpler. Now, you install one cluster node with a wizard, which asks a couple questions, and the node installs itself if the hardware is correctly configured. Then, if you want to add more nodes, you just identify the nodes and the wizard sets them up."

Kurt revealed, "Within the next 6 months, we're going to release a free tool, ClusPrep. Once you've configured your hardware, ClusPrep will tell you whether it's configured correctly or what's wrong with your configuration. We want customers to be able to verify that their hardware is configured correctly before turning on clustering."

Will ClusPrep work only on Windows 2003? "It's designed to work with Win2K as well as Windows 2003," Kurt said. "But the console has to run on either Windows XP or Windows 2003. The systems you will cluster can be either Win2K or Windows 2003."

What about providing more wizards? "We're looking at that for the next release," Ryan replied. "For example, Exchange and Microsoft SQL Server could provide wizards to set up resource groups and resource dependencies. The next release of SQL Server installs the entire cluster in one step."

In addition, Ryan mentioned other new tools. "We recently released the Microsoft File Server Migration Toolkit, for consolidating file servers using Dfs without a bunch of manual steps. If you have NT 4.0 or Win2K file servers, whether they're clustered or not, this toolkit lets you replicate the data onto a new storage infrastructure and automatically set up Dfs. So, for access, you can still use the old server name and share name, but you're actually being redirected behind the scenes to the new highly available file cluster. That wizard is available today, and customers are using it to consolidate file storage on fewer servers."

In the area of cluster management, Kurt wanted to highlight Microsoft Operations Manager (MOM). "The cluster group just released a new MOM pack. It identifies events and advises you about what to do with them."

What about monitoring if you don't have MOM? "You rely on CluAdmin, the GUI administrator package," Kurt answered. "Icons show up when things go bad, we write all cluster events to a log, and you can watch the log. But if you want active monitoring where you can set the thresholds and filters and include extra operations, MOM is a better answer."

Past, Present, Future
One reader commented, "Cluster Services hasn't evolved much since NT 4.0, compared to most other technologies in Windows Server." Ryan responded, "Historically, clustering was only available on proprietary systems, only very large IT shops could afford to implement it, and they could run a lot fewer workloads. Microsoft's goal when we introduced clustering in NT 4.0 was to make it available on commodity hardware and to lower that bar. We've come a long way, but we can still do some things to reduce both hardware and software complexity—I think an example such as the ClusPrep tool is getting us a little further toward that goal."

Looking forward, Kurt emphasized, "The highest goal for Longhorn is simplified management. With our new task model, we want to identify the tasks to run a cluster and make every task implementable as a single-step operation. So if you wanted to add a new node, for instance, instead of saying run this, then do this, then type that, then initialize this, then set this property (as a series of discreet operations, the way we do it now), we'll provide a single operation that says 'Add a node.' Our goal is that any task will have a command that just does that task."

Please let me know about your experiences with clustering. And send me your suggestions for future topics.