Choosing the right backup strategy for you

Mention the word backup to a typical computer user, and you'll probably hear something like, "I never do it." The concept of backing up files is often disregarded and poorly understood in computer circles. Unfortunately, the consequences of not properly backing up your files can put you out of business.

Starting with this article, the Windows NT Magazine Lab digs into the topic of backing up files in Windows NT. Over the coming months, the Lab will review various backup applications for NT. Last year, few NT backup applications existed. Today, vendors provide numerous NT backup programs for everything from protecting the data files on your workstation to enterprise Hierarchical Storage Management (HSM).

This month, I'll concentrate on the importance of backing up your files correctly. I'll also look at backup solutions and hardware considerations for different network environments.

The Importance of Backups
Backing up your files is pointless unless you establish and follow a backup strategy and take all the necessary precautions. Let me give you a real-world example. I consulted for a small company (about 15 employees) that performed daily tape backups of its server and copied critical files to notebook computers using batch files. The company established a procedure to back up the latest files from the server and take the backup tapes off site every Friday. However, the company became lazy and did not take any tapes or the notebooks off site. Instead, the company left the backup files on site. This approach seemed relatively foolproof. After all, the employees were performing backups in a way that provided redundancy in case something happened to the master files.

Despite taking these precautions, the company suffered a catastrophic loss when a fire burned down the building. A firefighter saved the server 20 minutes before the fire would have destroyed it--the backup tapes and notebooks were useless because they burned in the fire. Fortunately, I was able to recover the files from the damaged server, and the network was up and running the next day.

So what's the moral of the story? Although the company was backing up its files, it wasn't taking all the necessary precautions.

Although seemingly insignificant, this example points out some misconceptions about performing backups. First, never assume that your company is following its prescribed backup policy. Compliance is an important part of backup. You need to establish the optimal strategy for your backups and then stick to it. The company in the example was following its backup strategy but didn't move the tapes and notebooks off site. Considering that roughly 80 percent of all businesses that lose their database go out of business, this company was very lucky.

Second, don't be ashamed to copy files to a hard disk (i.e., using notebooks to maintain copies of data)--tapes are not the only medium that works. Finally, don't assume that just because you've backed up your files, your data is safe. Store your backups off site.

Likewise, don't rely on fault-tolerant systems as your backup systems. As the name suggests, fault tolerance lets systems continue to function when something goes wrong. When all fault tolerance fails, you use your backup tape to restore the system and data. Backups are the last ditch effort to save data and systems.

But suppose the tape is no good? A common problem with backup procedures is the failure to verify tapes. If you expect the data on a tape to be good for five years, you are probably in for a surprise. (An employee at Digital Equipment lost years of email as a result of a bad tape header. Although specialized companies can now recover such data for you, the cost is high.) How can you tell if the data on the tapes is OK? Simply restore the tape and have as much redundancy as you deem appropriate (i.e., if necessary, have more than one tape with the same information). All these steps (performing backups on a regular basis, storing your backups off site, and verifying the quality of the data on the backup tapes) are essential to a sound backup strategy.

The Point of the Exercise
Backups not only let you circumvent computer disaster, but you can use them with fault tolerance to rebuild a crashed server; frankly, backups can save your job. The list of reasons why you need to back up your data includes

  • Catastrophic losses: Natural disasters, such as the fire I described previously, can happen.
  • User-induced errors: Users can accidentally delete files or lose code because of an improper command.
  • Hardware failures: Hard disks can fail, and power supplies can short out.
  • Vandalism and security failures: Hackers can destroy or alter files.
  • Software failures: Entire databases can become corrupt.
  • Audits: You need to produce archived data for legal purposes.

You need to take all backups seriously. File backups can be a source for serious legal repercussions and can be subpoenaed in court with due cause. Be careful about what you back up.

Successful backup strategies must be set as policy at the company level. Nothing is more frustrating than establishing a backup strategy without company support--without it, you will probably have a hard time accomplishing your objective, and you may not receive adequate funding for proper backups.

The Right Strategy for Your Environment
When you consider your backup strategy, you need to account for the effect on the bandwidth of your network. Many companies use hubs rather than switches on their networks, and hubs are notorious for creating bottlenecks. If you use Ethernet, I recommend you add switches with full-duplex capability. For a network backbone, try using at least 100Base-T, and for large environments, I suggest asynchronous transfer mode (ATM) or a comparable backbone.

After you have done your best to minimize the bottlenecks, you need to decide what backup drives and devices are best for your environment. You can define levels of backup based on network complexity. The types and sizes of files needing backup can also affect your decision. For example, if a site is concerned primarily with programming, backing up many small-sized files can be a serious issue. Many backup applications do not handle small-sized files efficiently. Other businesses may require long-term storage and retrieval of files. In these environments, backup applications must maintain large databases of files and information for the user with transparent retrieval of data (i.e., the system must be able to let you retrieve data even if the user isn't present). Let's look at backup solutions for five distinct environments.

Standalone workstation. In a standalone-workstation environment, the user needs to back up the files on a local machine. For such an environment, NT's native Backup program is usually sufficient. NT Backup lacks some of the features of third-party backup applications but is sufficient for desktop backup (NT Backup is not for network backups because it has problems backing up remote Registries). Simply copying files to a second hard disk can also work well.

Workgroup. In a workgroup environment, you might have to back up files for small groups or single systems. Most users attempt to use NT Backup for workgroups. Unfortunately, NT Backup will not copy remote Registries, and the batch files that allow remote backup typically expose usernames and passwords. Therefore, you need to consider using different backup applications. A workgroup environment will most likely have PCs, but it can also have Macintoshes and Sun Microsystems SPARC stations. Depending on your workgroup configuration, you need to incorporate a backup application that includes all the appropriate agents for your various systems.

Department. A medium-scale client/server department environment typically consists of fewer than 500 machines. In this environment, the system automatically saves many files to a relatively small database (10GB to 20GB) on strategic servers that you need to back up. One concern in this environment is user compliance because users can save files locally instead of saving them to a network server. You must account for these local files if they are important to your business.

Backing up 500 machines is a long process. You need to consider the number of backup servers, the types of backup devices, use of centralized control, and the heterogeneity of the desktop and server operating systems. One solution for this moderately complex environment is to use batch files to copy changed files to a central system that you can back up. Such setups require serious attention and a dedicated staff. Alternatively, you can insert several backup servers and do multiple backups at once. In this situation, you want to maintain a high backup rate--I suggest you shoot for a backup rate of more than 30MB per minute (an easy task if you have adequate bandwidth and hardware).

Enterprise. The enterprise environment consists of a network with large-scale systems and fewer than 1000 units. The features of this environment are similar to the departmental network, but the database in the enterprise environment typically approaches 100GB. In such an environment, you must incorporate all backup resources to reduce workload. Backup becomes a physical issue--you must back up more information in a smaller timeframe. In this situation, you can migrate files to magneto-optical (MO) storage towers using defined criteria. For example, you can state a rule to migrate all .doc files not opened in the past three months to the file repository if the files' hard disk reaches 40 percent of its capacity. This process is transparent and is part of many HSM implementations. HSM is the process of automatically storing data on the lowest-cost devices (magnetic disk, optical disk, and tape) that can support the performance that the applications require. Users see the storage as one logical unit, and file access is completely transparent. This approach minimizes storage costs while optimizing performance. In such circumstances, you use backup with file storage maintenance.

If you use an active tape library device, look for one that requires as little user intervention as possible. Numerous tape devices are available; some even use RAID configurations and are fast. These devices are expensive, and you can justify the cost of these units in only large enterprise environments.

Enterprise-plus. The enterprise-plus environment is large-scale (with more than 1000 systems) and multiplatform, and built on a heterogeneous network operating system (e.g., UNIX, NT, NetWare, Mac, Windows 95, and Windows 3.x). This type of environment is hard to maintain, and political in nature, with users wanting to control their systems rather than operating under centralized control. Although these problems exist on small-sized networks, the size of the user base in this environment creates serious IS problems.

Database sizes in this environment can be almost limitless, so backup strategies take on a new complexity. Centralized control is difficult across distant LANs and WANs; thus, you must segment backup into logical groupings. In this environment, you perform backups at the departmental level. In many cases, you have to add HSM to maintain adequate system control. In addition, you have to back up the HSM data. Standardizing on applications and desktops is optimal in such an environment, but standardization is a company policy issue. The complexities of these large networks make backup difficult. Every aspect of backup can spell the difference between success and failure.

In addition to selecting the right backup strategy for your environment, an important aspect of backup is selecting the right hardware. You need to choose carefully. Most IS managers selecting backup hardware are limited by monetary constraints. However, the type of device dictates the success or failure of the backup. Without proper equipment, backup can become a nightmare because the time available for backup is always diminishing, but the amount of data is always dramatically increasing.

The Right Hardware for Your Environment
Many vendors offer backup drives and libraries for NT (for a list of NT backup vendors, see "Buyer's Guide for NT Backup Solutions,"). I've looked at the following devices:

  • HP SureStore DAT8--DDS-2 drive
  • HP SureStore DAT24--DDS-3 drive
  • HP SureStore DAT24x6--DDS-3 autoloader
  • Exabyte 8700--8mm drive
  • Exabyte EXB-8505--8mm drive
  • Exabyte 210--Tape library with two 8505 drives and barcode reader
  • Exabyte Eliant 820--8mm drive
  • HP 20XT--MO jukebox (for temporary storage or HSM)
  • HP SureStore 40--MO jukebox
  • HP SureStore DLT30--DLT drive
  • Qualstar TLS-4000 series--8mm tape libraries

Finding a backup device suiTable for your environment doesn't have to be hard. For standalone backups, anything goes. I prefer the SCSI tape units because the drivers are more common than the quarter-inch cartridge (QIC--for a list of backup-related terms, see the sidebar, "Backup Terms and Technologies,") drivers in NT. If you decide to use an IDE unit or a special card for compression, make sure it comes with an NT driver. If you use NT Backup, remember that it does not support software compression.

For small environments, the new 4mm and 8mm drives offer increased storage and speed. HP's DDS-3 drives offer data transfer rates that are comparable to digital linear tape (DLT) drives using DLT2000XL tapes. The HP SureStore DAT24x6 autoloader offers up to 144GB of storage. These devices are ideal for small- and medium-sized networks. Old drives, such as the Exabyte EXB-8505 and the HP SureStore DAT8, still work well but are not as cost effective as the new drives because they're slow. If you choose a jukebox, remember that you must still unmount and expel the backup tapes and store them off site or in protected (fireproof) and cooled tape safes.

Large environments require specific hardware. Large library add-ons are available for such devices as DLT drives that use DLT7000 tapes. For example, some companies run six DLT units that share 100 tapes. The backup rate in such environments is amazing. These systems are often limited only by the lack of adequate bandwidth on the controlling computer. Each jukebox requires several SCSI IDs or logical unit numbers (LUNs--more than one device per SCSI ID). You need an ID for the robotics arm and one for each tape drive. The HP SureStore DAT24x6 uses logical unit assignments SCSI ID# LUN 0 and SCSI ID# LUN 1.

Another strategy is to place backup devices in strategic sites throughout the environment and control them through a centralized staging area. (Most backup applications are adopting this type of approach.) The backup device you choose depends on the type of data, whether users are responsible for backing up their machines, and the size of the backups.

Choosing tape libraries is an important decision in a large network environment. I have extensive experience with the Exabyte 210 and the Qualstar TLS-4000 series. The Exabyte 210 is an industry standard. Unfortunately, you must use proprietary drives in the Exabyte 210 library. In addition, the Exabyte 210 uses belts and gears to move the robotics that insert and remove tapes. This process is tedious, and any inventory takes considerable time (e.g., 5 minutes). Likewise, you cannot open the case to remove or add tapes or to look at tape labels, without taking the unit offline. Every time you open the case, the system has to re-inventory the tapes. Despite these limitations, the Exabyte 210 runs well when you set it up properly.

The Qualstar TLS-4000 series differs from the Exabyte 210. The TLS-4000 libraries have a slot (I/O port) in the front that lets you insert tapes without opening the unit (this feature is also available on some Exabyte units). Of more significance is the design of the unit. The robotics use a lead-screw mechanism to move tapes. This method is much faster than belt and gear units. In addition, the TLS-4000 libraries have inventory sentries that minimize offline time. You can open the door on these libraries to read tape labels without having to reset the tape inventory.

The TLS-4000 series can use any standard tape drive that Qualstar has qualified. This feature simplifies replacing standard drives in the libraries. Finally, the TLS-4000 libraries have non-volatile RAM that stores system data when you lose power. For environments requiring large tape libraries, the Qualstar units deserve serious attention because of their device support from 4mm to 8mm. In my experience, the Qualstar TLS-4000 series libraries are more amenable to enterprise usage than comparable tape libraries such as the Exabyte 210 (the Exabyte 210 is as easy but more time consuming to maintain).

The use of MO storage towers is essential in large environments because these towers are bigger and faster than traditional tape backup devices. The applications that control these devices can dump a lot of data on these units, making them ideal for data repositories and HSM. The main advantage of these devices is their retrieval speed over traditional tape backup devices. However, NT does not natively handle MO devices gracefully. An application has to control the device as a service (or similar) to make the device function as a jukebox. You can expect to see backup applications use more and more jukeboxes for storage.

Many vendors continue to develop other types of devices that might provide speed in critical situations. These devices include RAID tape configurations and tape and drive systems. With the RAID systems, the hardware in the unit stripes the tapes. With the tape and drive systems, the hardware backs up the data to a hard disk and then dumps the data offline onto a tape. This tape and drive approach is much faster than traditional tape-based backups.

Finally, hard disks continue to decrease in price, and many companies are forgoing tape backups and simply copying their files to additional hard disks. Although this strategy is suiTable in most cases, it does not fulfill the needs of a proper backup (i.e., most hard disks aren't readily removable). Copying or backing up files to removable drives is also becoming popular, particularly among end-users. How well the removables will function at the enterprise level is hard to tell.

In the coming months, the Lab will examine some backup and HSM applications that are readily available for NT. We'll start by looking at high-end applications and then address other solutions.