Windows IT Pro is the leading independent community for IT professionals deploying Microsoft Windows server and client applications and technologies.
  
  
  Advanced Search 


September 1999

Recovering from NT Startup Failures, Part 1


RSS
Subscribe to Windows IT Pro | See More Backup and Recovery Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Think Parallel

Tricks to prepare for and recover from NT meltdowns

That would you do if one of your core production servers crashed the next time you reboot it? More important, how much time would you need to fix the problem? For most Windows NT administrators, the thought of a mission-critical production server experiencing STOP errors (aka the blue screen of death) or any form of server outage makes them break out in a cold sweat.

A hosed NT system is never fun, but an unavailable critical server means lost productivity, lost time, lost money, and, of course, an angry boss. In this first installment of a two-part article, I discuss advanced tools and procedures that you can use to improve the availability of your network servers and to increase your chances of recovering from an NT boot failure. In addition, I delve into lesser-known techniques that you can employ right away to help you recover a downed NT system in the future. In this article, I don't address clustering solutions, and I assume that each system is a standalone, nonclustered NT system without system-level failover.

Common Calamities
Although various circumstances can cause an NT system to crash at startup, the result of these circumstances is usually the dreaded blue screen of death, which Screen 1, page 100, exemplifies. After NT halts the system, it displays this screen to protect the system against data corruption. In addition to being blue as its name implies, a blue screen displays important information about the system's state at the time of the STOP error. The screen lists the STOP code, the location in memory where the problem occurred, and the drivers loaded in memory when the STOP took place. However, pinning down the source of a STOP error isn't always easy. In my experience, a problem usually develops from one of the following scenarios:

  • You install software that corrupts the HKEY_LOCAL_MACHINE portion of the Registry—particularly, software that installs new services or drivers. This action usually results in a STOP error or blue screen, which indicates that the system Registry or a particular hive file failed.
  • You change a system's network configuration, which causes NT to rewrite network bindings and their related Registry entries (i.e., NT corrupts or overwrites critical OS files with invalid or incompatible versions while the system is in use).
  • You install a new service or driver on the system, which causes a system-level incompatibility problem that results in a STOP error when you reboot (i.e., underlying file corruption has occurred on a key system file that you loaded into memory before the corruption).

Each of these situations has a different set of underlying causes and solutions, so let's look at each scenario individually.

Registry Corruption
The system Registry is the heart of an NT installation. Thus, depending on the nature and extent of the damage, a corrupted Registry often results in a STOP error or blue screen of death at startup. Damage to the Registry can be physical or logical. Physical damage means that something (usually disk-related corruption) has scrambled the Registry hive files (e.g., the SOFTWARE or SYSTEM files in the \%winntroot%\system32\config folder). Logical damage means that a third-party application, a user, or NT has written invalid data to the Registry, which can trigger an NT startup failure if the logically damaged Registry entry is critical.

Unfortunately, you can't always tell whether a damaged Registry is the cause of your system's STOP error. The STOP error might identify a telltale sign such as a hard Registry error or a reference to a particular damaged hive file. However, in some cases, the STOP error doesn't indicate Registry damage.

If you suspect a Registry-related problem, the first line of defense is to restore a previous known-good Registry configuration. You can use several methods to accomplish this solution.

The Last Known Good Configuration option. You access this option by pressing the space bar when the system prompts you during the NT boot process, and selecting the option to restore a previous configuration. This method is the quickest and easiest solution, if it works. Unfortunately, this solution's failures outweigh its successes in real-world applications because its scope is only a previously known-good incarnation of one portion of the Registry (i.e., a ControlSet00X Registry subtree of the HKEY_LOCAL_MACHINE\SYSTEM key). You have a better chance of success using the Last Known Good Configuration option if the problem is localized to this portion of the Registry and an event that immediately precedes the invocation of the Last Known Good Configuration option caused the problem. However, this procedure won't cure most of your Registry-corruption ills.

NT Setup's Repair process and an Emergency Repair Disk (ERD). You can use NT Setup's Repair process to inspect and replace individual Registry hive files if the Last Known Good Configuration option fails to resolve the problem. After you insert your ERD, Setup lists the options you can select to specify which portions of the NT installation you want Setup to inspect, as Screen 2 shows. If you select Inspect registry files, Setup displays a list of Registry hive files and lets you select which files you want Setup to replace. Setup takes the replacement files from the ERD or, if you didn't provide an ERD, from the \%systemroot%\repair folder. The ERD and the \%systemroot%\repair folder store replacement files in compressed format, and each hive file has an underscore (_) extension (e.g., SYSTEM._, SOFTWARE._).

Using the most recent replacement files is important so that you don't lose application and service configuration information. (For information about how to update your ERD, see Michael Reilly's "The Emergency Repair Disk," January 1997.) In addition, don't restore the SAM and SECURITY hives on an NT server domain controller, unless you used the rdisk /s (or /s-) option when you ran the ERD utility (i.e., rdisk.exe). Otherwise, Setup overwrites your SAM database with the database version Setup created during the original NT installation and creates a new set of problems. In addition, ensure that you created the replacement files under the same service pack level as the files you're replacing because Service Pack 3 (SP3) and later make security-related changes to the SAM and SECURITY hives. Otherwise, you might not be able to log on after the repair is complete. Restoring the SAM and SECURITY files usually won't resolve your Registry corruption problems anyway because the SYSTEM and SOFTWARE hives usually cause Registry boot problems. Thus, start restoring previous Registry files with the SYSTEM and SOFTWARE files, and replace the SYSTEM hive first because it contains references to important system components, including drivers and services.

   Previous  [1]  2  Next 


Top Viewed ArticlesView all articles
Confirmed: Battery Life Issues Not Windows 7's Fault

Microsoft on Monday issued a lengthy statement about the recent Windows 7 battery controversy, echoing my assessment from earlier in the day, but backing it up with hard, cold evidence. ...

Battery Life Issues Almost Certainly Not Windows 7's Fault

While Microsoft is still investigating a notebook battery life issue that was supposedly caused by Windows 7, some interesting trends have emerged. ...

Microsoft Warns of Windows Version Expirations

Microsoft warned that this year will see three out-of-date Windows versions slip into retirement. ...


Related Articles Recovering from NT Startup Failures, Part 2

Storage Whitepapers Turn to a Proven Server and Storage Migration Solution

The Impact of Disk Fragmentation on Servers

Take Control of Your Email: Understand the Business Reasons for Email Storage Management

Related Events Why We Do Backups and Do We Even Know How to Use Them?

New Year Savings Plan: How to get more out of your backup with new technologies like data deduplication, server virtualization and others

Real World Applications of Storage Strategy

Check out our list of Free Email Newsletters!

Storage eBooks A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Keeping Your Business Safe from Attack: Encryption and Certificate Services

Related Storage Resources Introducing Left-Brain.com, the online IT bookstore
Looking for books, CDs, toolkits, eBooks? Prime your mind at Left-Brain.com

Discover Windows IT Pro eLearning Series!
Clear & detailed technical information and helpful how-to's, all in our trademark no-nonsense format


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro DevProConnections IT Job Hound
Left-Brain.com Technology Resource Directory asp.netPRO ITTV Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 © 2010 Penton Media, Inc. Terms of Use | Privacy Statement