Failure Management
Despite all precautions, failures will occur. If a failure involving any of
the redundant services occurs, the server will be unavailable, but the service
will continue to function. For this reason, it's important that you have a monitoring
solution in place, such as Microsoft System Center Operations Manager, that
will notify administrators in the event of a failure. Here's how to handle a
failure, depending on which server fails:
-
Web servers—If a Web server fails, the server will no longer be
running on the virtual IP address and NLB won't direct requests to it. Repair
the server, and bring it back up in the NLB cluster.
-
Application servers—If a server hosting Excel Calculation Services
or the Query service fails, that server will no longer respond to requests,
and those requests will go to another server hosting the service. If a server
hosting the Index service fails, the Query servers will continue to respond
using cached information. After the server is recovered, index propagation
will resume.
-
SQL Server (database) server—In a clustered environment, SQL Server
will fail over to the inactive node in the event of a failure. It's important
to repair the failed node and test failover/failback to ensure uptime in
the event of future failures.
It's All About Reliability
SharePoint is a crucial application in most environments, necessitating a high-availability
infrastructure. The two-tier and three-tier architectures satisfy the need for
high availability by placing services that can be made redundant on multiple
hosts, and NLB and MSCS technologies provide continuous access to content in
the event of a single cluster node failure. Using the available tools, administrators
can enable the necessary reliability to ensure that data and productivity are
maintained.
End of Article


You say: "It does no good to have redundant servers if your storage device represent a possible single point of failure".
But also you "recommend" to use NLB, which is not able to detect software level errors. Meaning, if the web site goes down for some reason, NLB still behaves like there are no problems.
Also even your storage is HA designed, you still have a Quorum disk in cluster. If you lost that the whole cluster shutdown.
Pepi August 29, 2007 (Article Rating: