Floods, fires, earthquakes, power outages, and software and hardware failures are reminders of why disaster readiness and recovery are so important. Maintaining business continuity in the face of this adversity could mean the difference between weathering the storm and going out with the lights.
Enterprise IT groups know and handle this challenge well, but it can be quite difficult for smaller organizations to meet 99.0 percent uptime requirements, let alone 99.999 percent. Cost and complexity barriers keep many businesses from trying high-availability solutions at all, forcing IT staff to use manual, administrator-intensive detection, remediation, and recovery processes.
Many forms of high-availability solutions exist today, ranging from software-based solutions to mission-critical solutions that offer hardware-level redundancy and failover. The trick is to pick the right one for your organization, thereby achieving the desired availability without breaking your IT budget. As with network security, the more you can afford the better off you'll be, but there is a tipping point at which you're throwing good money after bad. In other words, your particular business might not require extreme measures. I recently took a look at Stratus Technologies' Avance high-availability software, one of the midrange solutions that can deliver availability at near-enterprise levels, but without the million-dollar outlay.
CIOs often call on systems administrators to reduce costs but still boost IT reliability. Administrators in small-to-midsized businesses (SMBs) tend to feel this crunch more acutely, because delivering fault tolerance can more than double the cost of the existing infrastructure for backup servers, redundant networking, and so on. Although native technologies in Windows Server are capable of getting you part of the way there, they fall short of the instantaneous failover that's needed for demanding workloads -- and demanding CIOs.
Stratus aims to solve this conundrum through a hardware agnostic, yet not entirely hardware independent, software-based availability package for SMBs. Stratus has made its name in enterprise-class high-availability solutions for more than 30 years, keeping the lights on 24 ´ 7 for critical human services, such as 911 call centers, hospitals, utilities, and more.
Avance combines a software offering with proactive management (which can even be monitored by Stratus remotely) and hardware redundancy. An Avance high-availability cluster provides near-zero failover and recovery times, with near-zero client impact (including stateful applications) using real-time monitoring and data replication. If you're running a heterogeneous environment, you'll also appreciate Avance's support for Linux server platforms (e.g., Red Hat, CentOS) and applications. Avance uses CentOS 5.5 and Citrix Systems XenServer virtualization technologies to abstract hardware from software, providing a foundation for transparently migrating OS and application workloads between physical systems in the event of a failure.
You can use most of the off-the-shelf server, networking, and storage hardware as long as any two systems you cluster are similar enough that a hardware mismatch doesn't result in bad driver behaviors (and thus a crash). In addition, the same RAID configuration must be used on both machines. One benefit of this clustering approach is that you don't need to purchase a dedicated storage array for data because replication between servers occurs over the wire.
The downside is that you still need an equivalently configured second server as a hot standby. Note that you won't have an active-active performance cluster. For more information, see the sidebar "How Avance Works."
For expediency, I started with two white-box Intel servers, which were supplied by Stratus. Each server had a S5520UR motherboard, dual quad-core Xeon X5560 processors, 24GB of memory, and 2TB of disk space. You can gain additional hardware resiliency if you select a chassis with hot-swappable components (e.g., CPU, RAM), RAID controllers, redundant power supplies, failover NICs, and so forth; doing so will reduce the likelihood of a single-server failure. This isn't required, however, since the solution's real-time monitoring includes more than 150 different metrics and predictive analytics that will trigger a live migration if a fault is either detected or about to take place.
Your dual-server configuration doesn't need to be any different from your standard build, with the exception of a dedicated gigabit Ethernet port on each machine for management and data replication, which is referred to as the "Sync" link. The servers can also be completely headless (after initial setup), because all maintenance operations are performed through a web-based console. However, Stratus recommends redundant Sync links to improve performance and fault tolerance.
Avance installation is straightforward and uses a self-imaged DVD. It automates setup for both servers through a single process, but you should reformat the machine if you're repurposing older hardware. (You can't change out the hardware on an existing OS platform build or migrate it from another machine unless it's identical hardware and already virtualized.) Adding the second machine to form a cluster is achieved by a fast software install driven from the primary node. When you join the second server to the cluster, an automated synchronization process images and configures it.
Avance's instant data replication between nodes means each server is always up to date. When a hardware failure, predicted failure, or planned shutdown occurs, the second machine simply picks up where the first one left off. This lets you carry out whatever maintenance is required on the first node without a service interruption. When you're done, you can manually flip the workload back to the first node or leave the workload on the second node, letting it migrate back to the first node only if a failure is detected on the second one.