Keeping Up with Win2K and NT

If you're planning to deploy Windows Server 2003 for the first time, you can avoid a myriad of documented performance problems if you include the following bug fixes in the server images before you release them into production. The late October DNS update corrects a memory leak that slows performance to unacceptable levels and eliminates intermittent name-resolution failures. The interim File Replication Service (FRS) update from June corrects six bugs and introduces new functionality, including the ability to force replication from the command line. The Microsoft Volume Shadow Copy Service (VSS) writer patch eliminates two problems that cause a multitude of services and applications that rely on shadow copies to fail. One domain controller (DC) update ensures that a DC doesn't lose Global Catalog (GC) server functionality, and a second eliminates a performance slowdown caused by a memory leak in the Local Security Authority (LSA) service. Read on to learn the details about these fixes.

DNS. The DNS service in the original release has two problems that prevent the service from responding to name-resolution requests. The first bug, a memory leak in the DNS service, can cause the DNS service to consume up to 100 percent of the CPU within 1 hour of restarting the system. The length of time it takes for the DNS service to slow significantly depends on the number of name-resolution requests the service receives. When the memory leak is large enough, the server might take up to 20 seconds to respond to a client query, and in some cases, the name query might time out. The DNS slowdown negatively affects the server’s responsiveness to other activities as well. You can work around the slowdown by periodically restarting the DNS service. To avoid the problem, call Microsoft Product Support Services (PSS) and ask for the October 7 update of dns.exe, file version 5.2.3790.90. This problem is documented in the Microsoft article "Server Responsiveness Degrades and Queries Time Out When You Run the DNS Server Service" (http://support.microsoft.com/?kbid=830381). The second bug causes the DNS service to correctly resolve some names but not others, and it’s apparently related to cache lookup errors. If you experience inconsistent name resolution, you can work around the problem by periodically purging the DNS cache (right-click the server icon in DNS and click Clear Cache). To eliminate the problem, call PSS and ask for the October 21 update of dns.exe, file version 5.2.3790.94. When you call, reference the Microsoft article "DNS Intermittently Stops Resolving Some Host Names" (http://support.microsoft.com/?kbid=830905).

FRS. As it did with Windows 2000, Microsoft is releasing FRS updates on a different time line from service packs. The June 27 FRS update introduces four new FRS features: a new command (Ntfrsutl Forcerepl) that you can use to force replication at a command prompt, a larger journal log-file size (512MB) that eliminates problems that occur when the journal entries wrap around to the beginning of the file, improvements in how FRS manages and logs file-sharing violations, and the ability to restrict access to FRS debug log files. The release also corrects six FRS bugs, several of which address incomplete information in event-log messages, two timing problems that cause FRS to hang or to attempt replication before the system file policy is enabled, and a handle leak that negatively affects Windows Management Instrumentation (WMI) functionality. The Microsoft article "Issues That Are Resolved in the Pre-Service Pack 1 Release of Ntfrs.exe" (http://support.microsoft.com/?kbid=823230) documents the improvements and states that you can obtain this interim release only from PSS. The update consists of three FRS components—ntfrs.exe, ntfrsapi.dll, and ntfrsprf.dll—all of which have a file release date of June 27 and file version number 5.2.3790.64.

VSS failure. A VSS writer is a program or a service that uses VSS to duplicate data in a temporary storage area. Many Windows 2003 components and applications—including Ntbackup, Active Directory (AD), WINS, DHCP, Intellimirror, the Certification Authority (CA), Microsoft SQL Server, and Microsoft Exchange Server—use the service to implement their functionality. On underpowered systems, especially machines with slow disk subsystems, a service writer might time out in a variety of circumstances, which causes the calling service, typically a backup operation, to fail. When a server experiences a problem with a VSS writer, you'll see references to Volume Shadow Copy Service or shadow copy in the Application event log. The best workaround is to upsize the speed and capacity of systems on which these failures occur. When the system is adequately powered, you can install a hotfix that eliminates two specific VSS writer failures. The first eliminates failures caused by timeouts, and the second addresses lost shadow copies on a busy disk. The patch contains eight files, most of which have a file release date of August 28. See the Microsoft article "Time-Out Errors Occur in Volume Shadow Copy Service Writers, and Shadow Copies Are Lost During Backup and During Times When There Are High Levels of Input/Output" (http://support.microsoft.com/?kbid=826936) for a detailed description of how the service writers work and the operations they perform.

DC problems. Two problems can occur on Windows 2003 DCs: one that disables GC functionality and a second that slows system performance, sometimes dramatically. On a system processing many TCP/IP connections, a TCP/IP packet leak can cause a Windows 2003 DC to lose its GC server functionality. In a complex set of interactions, the packet leak prevents the server from registering as a GC server in DNS, which effectively disables the catalog’s operation. You can work around the problem by rebooting the server. To eliminate the problem, call PSS and ask for the August 1 update of tcpip.sys, file version 5.2.3790.74. Cite the Microsoft article "Domain Controller Loses Its Role as a Global Catalog Server" (http://support.microsoft.com/?kbid=824139). The second problem, a memory leak in the LSA service (lsass.exe), can slow a DC’s performance significantly. The problem is more pronounced in domains that contain a large number of groups with many members and in which the Lsass component must process many group-based queries (as, for example, when you restrict or permit access to resources using ACLs that contain multiple groups). Over time, the DC performance might slow to a crawl, at which point you need to restart the system to improve performance. PSS has an October 1 update of samsrv.dll that corrects this memory leak. The Microsoft article "Memory Leak Occurs in the Lsass.exe Process on a Windows Server 2003-Based Domain Controller" (http://support.microsoft.com/?kbid=829993) documents this problem.