I first became interested in business continuity through a rather odd route: When I was younger, I thought it would be extremely cool to be a US Air Force missileer. I read everything I could about the communications, command, and control systems used to ensure that the president (or his successors) could communicate with US armed forces even in the midst of a full-scale nuclear war. With the end of the Cold War, I wised up and moved on to other dreams, but I retained my interest in how organizations can prepare for large-scale problems or extreme conditions that affect their ability to keep working.

With the release of Exchange Server 2007 SP1, Microsoft will add a new replication mode, standby continuous replication (SCR), to the two we already have, local continuous replication (LCR) and cluster continuous replication (CCR). In my two previous columns, I wrote about CCR, which offers some very useful new capabilities for disaster recovery, high availability, and business continuity ("MNS and CCR," June 7, 2007, and "MNS and CCR, Part 2," June 14, 2007). However, CCR isn't the answer to every high-availability and business-continuity requirement. You can use only two nodes in a CCR cluster, and (until Windows Server 2008 ships) they must be on the same IP subnet.

There are essentially three outage types you might want to protect against: loss of a storage system (whether an individual disk, a RAID volume, or a SAN LUN); loss of an entire server; and loss of an entire site or data center. LCR and CCR address storage and server failures, but by themselves they don't provide adequate protection for site failures. Organizations that want to protect against site failures usually need the ability to move operations to an alternate physical location, often for an extended period.

SCR is designed to provide protection for whole-site failures. It uses the same log- and database-replication technology that LCR and CCR do, but it provides a great deal more flexibility. It doesn’t require that all nodes be on the same subnet, and there’s no set limit on the number of computers you can protect. With SCR, you can

  • copy data from one server to multiple targets (one-to-many replication)
  • copy data from multiple servers to a single target (many-to-one replication)
  • protect standalone servers, servers protected with LCR, or the passive node in a CCR cluster

SCR doesn't provide failover; it's a replication mechanism. However, you can use SCR to copy data to a remote site, then manually fail over operations to that site. In fact, doing any kind of SCR failover requires manual action. That was an explicit design decision, and it's a good one. Very few businesses are willing to entrust software with the responsibility of automatically deciding when it's necessary to fail over operations from, say, New Orleans to Dallas, to cite one recent example.

SCR has another interesting feature: a log playback delay. This feature provides you with a sort of escape hatch that can be very useful for recovery. Each log file is replicated as soon as it's closed, but the delay lets you select how long the SCR replica waits before applying the log. For example, say you set an 8-hour delay. If you experience a failure at noon Monday, your SCR replica has current data up to that point, but the database reflects the primary site's contents as of 4 A.M. Monday. Logs will continue to be played back according to the delay. If your failure was caused by a problem with the logs, a virus outbreak, or something else that happened at a discrete point in time, you can stop the SCR log playback before the failure is replayed into your recovery database. This feature is really useful, although I suspect there are design implications that we'll have to work through as part of deploying it.

One open question is how SCR will be licensed. Microsoft hasn't announced anything about licensing for SP1 features, so it's not clear whether SCR will be included in the Standard Edition server license or whether it will require the Enterprise Edition. Speaking of licensing, next week I plan to talk about Exchange licensing and how it's getting both simpler and more complicated.