Meeting the high-availability challenge

Participating in global markets and maintaining 24 * 7 presence for customers can be rewarding, but the challenge of maintaining servers to meet those availability requirements can cause many headaches for systems administrators. System outages can cause lost revenue, lost productivity, data loss, and customer dissatisfaction. One tightly packaged solution to meet the availability challenge is Stratus Technologies' ftServer 3210, an entry-level, fault-tolerant server with an average annual downtime of less than 1 hour.

Product Architecture
Stratus Technologies builds the ftServer 3210 from the ground up to be a highly available, fault-tolerant platform for mission-critical applications. The product runs Windows 2000 Advanced Server to accommodate the plethora of applications that require high availability (e.g., securities trading, retail banking, messaging, health care, point of sale).

The system's hardware compares to alternative solutions such as Win2K clustering in the way it uses standard architecture (e.g., Intel Pentium III processors, DIMM memory, Ultra 160 SCSI, PCI bus, hot-swappable drives). The defining difference is in the implementation of redundant hardware components. Whereas a cluster might have a second server standing by for failover use, the ftServer 3210 uses paired components internally.

Stratus Technologies uses the term Dual Modular Redundancy (DMR) to refer to the wholly redundant, lockstep-operating CPU modules. Lockstep operation means that at any given time, the system executes identical and precisely parallel actions on each of the paired components. The server ships with VERITAS Software's VERITAS Volume Manager 2.7, which dual-initiates the disks by using redundant SCSI controllers and buses, then logically pairs the disks. The Intel PROSet II utility manages the redundant NICs to establish fault tolerance. Other components (e.g., floppy disk and CD-ROM drives) are standard simplex devices. You can hot-swap the majority of the ftServer 3210 modules for service and upgrade operations.

The ftServer 3210 has some similarities to a Win2K-based cluster. The Win2K kernel is at the heart of both systems, and additional software is responsible for monitoring and responding to hardware problems. Stratus has built a layer of fault-tolerant services that exist below the application level in Win2K, so applications don't require modification to benefit from the ftServer 3210's availability. This software also maintains mean time between failure (MTBF) statistics that a systems administrator can analyze and use to tailor responses to crucial hardware concerns. Hardened device drivers for Stratus-supported PCI adapters provide self-monitoring duplexed operation and manage physical memory that the adapters access.

The characteristics that best differentiate the ftServer 3210 from a cluster relate to failure, recovery, and implementation time. With the ftServer 3210's lockstep operation, no switchover occurs when a component fails. Components that haven't failed simply continue to run the system until you replace the failed components. Recovering from a failure is less labor-intensive because you need only to swap modules in most cases. The new module synchronizes with the live module without interruption. Another benefit of lockstep operation is that in the event of a failure, the system protects data that's in memory as well as data on disk. Because implementing the ftServer 3210 doesn't require scripting or application modification, implementation is fast. The ftServer 3210 also sidesteps the typical shared-disk storage problems that plague conventional cluster implementations because the redundant adapters and cables are all internal to the product.

Monitoring and Management
Stratus provides two Microsoft Management Console (MMC) snap-ins for monitoring and management: ftServer Management Console and Stratus ftServer Software Availability Manager. (The Software Availability Manager items are visible only through a pop-up menu when you right-click the Availability Manager icon in MMC.) The ftServer Management Console lets you drill down for specific hardware information as Figure 1 shows, and the Software Availability Manager lets you set thresholds and alerting options for various monitored hardware and software components. I found both snap-ins to be intuitive and useful for monitoring and managing the server. The well-instrumented hardware provides more statistics than most people would ever want to see. You can also take advantage of this instrumentation through leading third-party management applications (e.g., Computer Associates'—CA's—Unicenter, IBM's Tivoli), and you can run a Remote Management Installation, which gives you the tools to manage the ftServer 3210 from another system. Although you can customize monitoring to suit your needs, the ftServer 3210 also comes equipped with integrated service technology.

With Stratus's integrated service technology, the server constantly monitors itself and contacts the Stratus Customer Assistance Center (CAC) immediately in the event of an exception condition. The server can contact the CAC through a modem or the Internet, and a Stratus service professional can remotely investigate the situation to expedite appropriate service measures.

Stratus offers four different service levels to serve a wide variety of customers. The service offerings range in price from $1000 to $9000 annually. Response, advanced exchange, and emergency service times all go down as the price goes up. The advanced plans provide around-the-clock monitoring and phone support, whereas low-end plans provide phone support for 9 hours per day, 5 days per week. The base system hardware warranty is 3 years and doesn't include onsite or remote access services that are part of the paid-for offerings. The Business Critical Service offering guarantees 100 percent uptime. In the event of downtime, Stratus will reimburse the customer for one month of the service fee. Advertised availability is from 99.99 percent to 99.999 percent.

Setup and Configuration
I quickly read the hardware installation instructions before beginning the easy hardware connection process. You can obtain the ftServer 3210 in either a pedestal or rack-mount configuration, but only the pedestal model provides extra drive bays for tape devices. I spent less than 5 minutes connecting the peripherals and communications cables and firing up the system. You might find it inconvenient that the ftServer 3210 has only USB keyboard and mouse ports. For this reason, the product didn't work with our keyboard/video/mouse (KVM) solution.

To test the ftServer 3210's fault tolerance, I used the default installation of Win2K AS that shipped with the product. If you order the ftServer 3210 without an OS or if you want to reinstall the OS yourself, you can perform an IPL. Whether you or the factory performs the IPL, the IPL loads Win2K AS, Stratus-customized Win2K files, appropriate Microsoft service packs, ftServer 3210 documentation, the ftServer Management Console, and ftServer 3210 software. Because the default installation was suitable for my environment, I spent less than 40 minutes configuring Stratus CAC contact information, thresholds, and alerting before declaring the system ready for production.

How Did It Perform?
To test performance, I pulled out one of each redundant component while executing a stream of Microsoft SQL Server transactions on the system. While I pulled out one CPU module, one power supply, one I/O enclosure, and a selection of redundant disk drives, the server ran as though nothing had happened. Behind the scenes, however, the server had notified the Stratus CAC of the problems it was seeing. Within a half hour, a Stratus service professional contacted me to verify that I was randomly pulling modules out of the server and that a meltdown hadn't occurred. The next day, just as easily as I had removed the modules, I reinserted them one at a time and watched as the system automatically handled their reintroduction and synchronization. The only noticeable signs of action were status LEDs and some extra CPU utilization from reactivating physical disks.

High Availability and Exceptional Support
The adage that says you get what you pay for holds true with the ftServer 3210. It's definitely not the least-expensive solution available, but when server reliability and availability are paramount to the success of your business, you might consider the ftServer 3210 a form of insurance. Stratus provides first-class documentation and support offerings. The lockstep operation offers distinct advantages over clustering, and hot-swappable modules mean that you can replace or upgrade modules without downtime. Including PS/2-style keyboard and mouse ports would make the ftServer 3210 fit into existing KVM solutions more easily. Stratus designed the ftServer 3210 to provide a highly available, stable platform, and the product appears to be up to the task.


Stratus ftServer 3210
Contact: Stratus Technologies * 978-461-7000
Web: http://www.stratus.com
Price: $26,725