Unlike traditional Windows NT clustering solutions that involve failover objects and
disk mirroring, Marathon Technologies' Endurance 4000 takes the idea of fault tolerance a step
further by offering a system mirroring solution. System mirroring involves two physically separate
computers doing exactly the same thing at the same time, right down to the cursor on the screen.
Figure A, which shows the Endurance 4000's basic configuration, will help you understand how
system mirroring works. The architecture consists of two compute elements (CEs) running in lockstep
and two I/O processors (IOPs). Each CE links to both IOPs with multiwire copper or multimode
fiber-optic cable (all I/O and lockstep traffic travels over this link) running at 25MBps. The CEs
boot NT Server from the disk storage on the IOPs and run the operating system from memory. The two
CEs work together to handle all CPU functions, and the IOPs serve as the network and disk
subsystems. Marathon's fault-tolerant software makes the entire configuration function as one
fault-tolerant server.
The Endurance 4000 kit consists of four PCI interconnect cards, two split-side datalink units
(crossover switches for the high-speed interconnects), and the Marathon fault-tolerant software. You
must supply the server hardware (systems, disks, controllers, etc.). The advantages to this
technology are that it can use any standard NT-certified hardware, it runs unmodified NT code and
applications, and it is Simple Network Management Protocol (SNMP)-enabled for remote management.
The system-mirroring architecture gives you disaster fault tolerance, with an emphasis on
disaster. You can completely crash one CE and one IOP, and the other CE and IOP take over instantly
(using a fiber-optic link).
For example, this redundancy means your campus building across the street can spontaneously
combust without interrupting the system service. You can also perform online upgrades (you will
encounter a service disruption when you switch in the upgraded system in case of software upgrades)
and repairs.
Marathon's system-mirroring architecture has a few drawbacks. First, the current configuration
can use only single-processor Intel-based CEs. Second, you get only about 90 percent of maximum
performance from this configuration because the CEs perform lockstep checking when sharing
information. In other words, you have two full systems that operate at 90 percent of the capacity of
one system because both systems mirror each other (just like mirrored disks). Finally, the
configuration doesn't allow dynamic scaling for additional nodes.
This solution provides 100 percent hardware fault tolerance only. If NT or an application
crashes on one CE, both CEs go down. None of this functionality applies to software fault tolerance.
For this situation, you need Marathon's ClusterPlusFT software with a third-party enterprise
clustering solution such as Wolfpack, LifeKeeper, or Veritasthe IOPs function as an ordinary
cluster with secondary applications running on them, and the fault-tolerant application still runs
on the CEs.
Marathon's Endurance 4000 isn't for everybodythe limited performance of Marathon's first
release (future upgrades may support symmetric multiprocessing SMPCEs) means that
enterprise IS environments with 500 to 1000 users can't deploy this solution as a fully
fault-tolerant SQL Server. However, a small shop can use Endurance 4000 with positive results. Large
shops will want to consider this solution only for limited use on mission-critical applications
(financial records, security files, and so forth).