Unlike traditional Windows NT clustering solutions that involve failover objects and disk mirroring, Marathon Technologies' Endurance 4000 takes the idea of fault tolerance a step further by offering a system mirroring solution. System mirroring involves two physically separate computers doing exactly the same thing at the same time, right down to the cursor on the screen.

Figure A, which shows the Endurance 4000's basic configuration, will help you understand how system mirroring works. The architecture consists of two compute elements (CEs) running in lockstep and two I/O processors (IOPs). Each CE links to both IOPs with multiwire copper or multimode fiber-optic cable (all I/O and lockstep traffic travels over this link) running at 25MBps. The CEs boot NT Server from the disk storage on the IOPs and run the operating system from memory. The two CEs work together to handle all CPU functions, and the IOPs serve as the network and disk subsystems. Marathon's fault-tolerant software makes the entire configuration function as one fault-tolerant server.

The Endurance 4000 kit consists of four PCI interconnect cards, two split-side datalink units (crossover switches for the high-speed interconnects), and the Marathon fault-tolerant software. You must supply the server hardware (systems, disks, controllers, etc.). The advantages to this technology are that it can use any standard NT-certified hardware, it runs unmodified NT code and applications, and it is Simple Network Management Protocol (SNMP)-enabled for remote management.

The system-mirroring architecture gives you disaster fault tolerance, with an emphasis on disaster. You can completely crash one CE and one IOP, and the other CE and IOP take over instantly (using a fiber-optic link).

For example, this redundancy means your campus building across the street can spontaneously combust without interrupting the system service. You can also perform online upgrades (you will encounter a service disruption when you switch in the upgraded system in case of software upgrades) and repairs.

Marathon's system-mirroring architecture has a few drawbacks. First, the current configuration can use only single-processor Intel-based CEs. Second, you get only about 90 percent of maximum performance from this configuration because the CEs perform lockstep checking when sharing information. In other words, you have two full systems that operate at 90 percent of the capacity of one system because both systems mirror each other (just like mirrored disks). Finally, the configuration doesn't allow dynamic scaling for additional nodes.

This solution provides 100 percent hardware fault tolerance only. If NT or an application crashes on one CE, both CEs go down. None of this functionality applies to software fault tolerance. For this situation, you need Marathon's ClusterPlusFT software with a third-party enterprise clustering solution such as Wolfpack, LifeKeeper, or Veritas­the IOPs function as an ordinary cluster with secondary applications running on them, and the fault-tolerant application still runs on the CEs.

Marathon's Endurance 4000 isn't for everybody­the limited performance of Marathon's first release (future upgrades may support symmetric multiprocessing­ SMP­CEs) means that enterprise IS environments with 500 to 1000 users can't deploy this solution as a fully fault-tolerant SQL Server. However, a small shop can use Endurance 4000 with positive results. Large shops will want to consider this solution only for limited use on mission-critical applications (financial records, security files, and so forth).

Endurance 4000
Marathon Technologies
800-884-6425
Web: http://www.marathontechnologies.com
Price: $24,999 for the kit (not including server hardware)