Windows IT Pro is the leading independent community for IT professionals deploying Microsoft Windows server and client applications and technologies.
  
  
  Advanced Search 


February 1998

Inside Microsoft Cluster Server


RSS
Subscribe to Windows IT Pro | See More Clustering and Load Balancing Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!

Resource States and Failover
Each resource exists in one of five states: offline, offline pending, online, online pending, and failed. When a resource is online, it is active and available for use. Offline pending and online pending are transitional, internal states that reflect a change from online to offline and vice versa. When MSCS notes a resource's state as failed, a failover ordinarily occurs; however, you can effect a manual failover from Cluster Administrator. One type of failure detection that is resource-independent occurs when a node or the communications links between nodes fails. Cluster Network Driver detects these problems and notifies Failover Manager and Membership Manager. The affected nodes determine, with the aid of Membership Manager and Resource Manager, whether they should attempt the failover of their resources.

Individual resources can fail, and the detection of individual resource failure is the domain of each resource's Resource DLL and the Resource Monitor. Resource Monitor pings each resource's Resource DLL at specified intervals during the configuration of the resource to determine if the resource is functional. Resource DLLs implement two entry points for failure detection: LooksAlive and IsAlive. At a configured polling interval, Resource Monitor sends the Resource DLL a LooksAlive request. Upon this request, the Resource DLL performs a cursory check to see if the resource is OK, and the Resource DLL must respond to Resource Monitor within 150 milliseconds (ms). If the LooksAlive request fails, Resource Monitor will send an IsAlive request to the Resource DLL. To respond to the IsAlive request, the Resource DLL has up to 300ms to determine if the resource has failed. If the Resource DLL answers no to Resource Monitor in response to the IsAlive request, Resource Monitor considers the resource to be offline and signals to Cluster Service to fail over the group that contains the failed resource.

Resource Monitor has timeout and retry mechanisms that it uses in its detection of failed resources. If Resource Monitor sends an IsAlive request and the queried Resource DLL does not respond to the request within a timeout period, Resource Monitor resends the request a certain number of times (you specify this number in Cluster Administrator). Resource Monitor considers a resource to be offline only when the queried Resource DLL does not respond to Resource Monitor's specified number of request retries.

You will run across the term failback in MSCS. Failback is the process of bringing resources back online on a node that has rebooted and restarted after a failure. You can use Cluster Administrator to configure a resource group to run on certain nodes preferentially. Thus, if MSCS has moved a resource to a nonpreferred node and a preferred node subsequently comes online, the resource might be forced to fail back to the preferred node. You might want to designate preferred nodes when you have a cluster containing nodes with different hardware characteristics, such as different processors and memory sizes. You can designate your SQL Server application to execute on the faster node and your IIS Web server to run on the slower one.

The Quorum Concept
One of the most important aspects of the successful recovery of failed resources is agreement among all the nodes in a cluster about which node will own the shared disk and where resources will migrate in the failed state. Consider the case illustrated by the three-node cluster in Figure 4. If node 3 becomes separated from the other two nodes, nodes 1 and 2 will assume that node 3 has failed, and node 3 will assume that nodes 1 and 2 have failed. Without some kind of arbitration process, node 3 would fail over all the groups that are active on nodes 1 and 2, and nodes 1 and 2 would fail over all the groups that are active on node 3.

To avoid such a situation, MSCS defines a quorum resource, which all nodes in a cluster must have access to, even though only one node can own the quorum resource at a given time. Through MSCS, the cluster hardware enforces the restriction that only one node at a time can own the quorum resource. If the cluster becomes divided, the quorum resource determines which nodes will remain active. For example, if node 1 owns the quorum resource at the time of a communications breakdown when node 3 has failed, the Membership Managers on nodes 1 and 2 will decide that they are the functioning part of the cluster. Node 3 will attempt to take ownership of the quorum resource, but the hardware lock node 1 imposed on the quorum resource will prevent node 3 from doing so. Node 3 will realize that it has become separated from the other nodes (which are still active) in the cluster, and it will release all its resources and halt operation.

In the current release of MSCS, the only defined quorum resource type is a shared physical disk that supports hardware-based lockout (thus the requirement for at least one shared SCSI disk). Other shared disks can store data, but the quorum shared disk is dedicated to its role in cluster management. In addition to serving as the quorum resource, the quorum shared disk stores the cluster change log and database. When the owner of a quorum shared disk fails, the subsequent quorum owner can finish any cluster-related transactions that are outstanding at the time of the failure after reading them from the log on the quorum disk.

The Future of MSCS
When I discussed failback, I mentioned that an administrator can configure groups to run on preferred nodes. This practice balances the load among the nodes in a cluster to maximize the return from the hardware—you don't want all your groups to run on one node while other nodes remain idle. Unfortunately, the current release of MSCS lets you only statically distribute workloads between the cluster's two nodes. To address this deficiency, Microsoft has promised it will release a version of MSCS in 1998 that, in addition to supporting clusters of up to four nodes, will support dynamic load balancing. Dynamic load balancing will simplify the management of MSCS clusters and maximize the use of cluster hardware. Although load balancing in a two-node cluster is relatively straightforward, load balancing in clusters of four or more nodes is much more complex.

As an administrator or developer of enterprise-level software, you will become familiar with MSCS. In time, more and more applications will be MSCS cluster-aware out of the box to adhere to the MSCS standard and take advantage of the high-availability infrastructure MSCS provides.

End of Article

   Previous  1  2  [3]  Next  


Reader Comments

You must be a registered user or online subscriber to comment on this article. Please log on before posting a comment. Are you a new visitor? Register now




Top Viewed ArticlesView all articles
Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

WinInfo Short Takes: Week of November 9, 2009

An often irreverent look at some of the week's other news, including some more Windows 7 sales momentum, some Sophos stupidity, Microsoft's cloud computing self-loathing, more whining from the browser makers, Zoho's "Fake Office," and much, much more ...

Understanding File-Size Limits on NTFS and FAT

A general confusion about files sizes on FAT seems to stem from FAT32's file-size limit of 4GB and partition-size limit of 2TB. ...


Related Events WinConnections and Microsoft® Exchange Connections

Deep Dive into Windows Server 2008 R2 presented by John Savill

Check out our list of Free Email Newsletters!

Windows OSs eBooks Understanding and Leveraging Code Signing Technologies

A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Related Windows OSs Resources Introducing Left-Brain.com, the online IT bookstore
Looking for books, CDs, toolkits, eBooks? Prime your mind at Left-Brain.com

Discover Windows IT Pro eLearning Series!
Clear & detailed technical information and helpful how-to's, all in our trademark no-nonsense format


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro DevProConnections IT Job Hound
Left-Brain.com Technology Resource Directory asp.netPRO ITTV Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 © 2009 Penton Media, Inc. Terms of Use | Privacy Statement