Back in 2007, Windows Server 2003 SP2 introduced a set of networking performance features—collectively known as the Scalable Networking Pack (SNP)—that utilized hardware acceleration to process network packets and achieve higher throughput. Prior to SP2, these features were also available in an out-of-band update for SP1 as described in the Microsoft article “The Microsoft Windows Server 2003 Scalable Networking Pack release,” but weren’t widely deployed by customers. The SNP features are commonly known as Receive-Side Scaling (RSS), TCP Chimney Offload (sometimes called TOE), and Network Direct Memory Access (NetDMA). In this month’s column, I’ll discuss specifics around RSS and TOE.


Historical Problems

Because of issues in the components and issues in network card drivers, customers who deployed Server 2003 SP2 on hardware that could use the features often had problems. Many customers resolved problems by disabling the features on Server 2003, and Microsoft released an update in the article “An update to turn off default SNP features is available for Windows Server 2003-based and Small Business Server 2003-based computers” that would disable the three features. A later update, “A Scalable Networking Pack (SNP) hotfix rollup package is available for Windows Server 2003,” allowed customers to enable the features if needed, but Microsoft’s recommendation is to leave the features disabled unless there’s a business need to enable them for higher network performance. In general, customers needing higher networking performance should utilize Windows Server 2008 or Server 2008 R2, due the included next-generation TCP/IP stack.


Fear in the IT Community

Because of the problems with SNP in Server 2003 SP2, the IT community quickly adopted the common practice to proactively and reactively disable these features. For Server 2003, this makes sense. But for Server 2008 and Server 2008 R2, disabling these features can often result in lower network performance and lower server capacity. These features are very stable on Server 2008 R2 (with or without SP1), and Server 2008 can achieve the same stability using SP2 and additional hotfix updates. Unfortunately, disabling them as one of the first steps to resolve networking issues is still a very common troubleshooting practice, with many problems not being resolved due to disabling the features.

Many customers have also started to disable additional offload features that have been stable across many OS releases. These offloads are typically named TCP Checksum Offload, IP Checksum Offload, Large Send Offload, and UDP Checksum Offload. They are available to configure in network adapter advanced properties or configuration utilities. These features are not the same thing as the SNP features, but customers often confuse them because of the similar naming. Also, many other performance enhancements require these features.


Receive-Side Scaling

Prior to the introduction of SNP, receive-side network processing in multi-core computers was conventionally bottlenecked by the fact that a single CPU services all the interrupts from a network adapter. RSS solves this problem by enabling a network adapter to distribute its network-processing load across multiple CPUs in multi-core computers.

By not having RSS enabled, you’re potentially wasting capacity and reducing overall load and network transactions that each server can handle. This could result in higher costs, due to buying more hardware than you actually needed, and additional infrastructure costs that come with the additional hardware.

For RSS to provide scalability, it must be enabled in the OS, which has a global impact on all network adapters, and it also needs to be enabled in the individual network adapters through their advanced properties or configuration utilities. By default, in Server 2008 and Server 2008 R2, RSS is enabled. You can see if it’s currently enabled or disabled by using the following command and looking at resulting output:

C:\Users\Admin>netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State            : enabled
Chimney Offload State                 : automatic
NetDMA State                          : enabled
Direct Cache Acess (DCA)              : disabled
Receive Window Auto-Tuning Level      : normal
Add-On Congestion Control Provider    : none
ECN Capability                        : disabled
RFC 1323 Timestamps                   : disabled

If RSS is disabled, you might see something like Figure 1. This picture is from the Performance tab in Task Manager, and you can see that Processor 0 is pegged at 100 percent CPU, while the rest of the processors are running at lower utilization. Seeing Processor 0 at a much higher CPU utilization is a good indicator that RSS might be disabled on a server. After enabling RSS, you can clearly see in Figure 2 the difference in processor utilization on the server as the CPU utilization for Processor 0 now fairly close to the other processors right around 3:00 A.M.

RSS also relies on the network adapter offloads (which I mentioned earlier) that are on by default, known as TCP Checksum Offload, IP Checksum Offload, Large Send Offload, and UDP Checksum Offload (for IPv4 and IPv6). So, if those have been disabled for a network adapter, RSS won’t be used for that network adapter.

Also, some network adapters have advanced settings to control the number of processors used for RSS and also the number of RSS Queues. A common mistake is to set the RSS processor very low, compared with the number of processors on the server. Each adapter and manufacturer has its own recommendations for settings, so please see the vendor documentation to determine optimal settings based on your environment and workload.

TCP Chimney Offload

TCP Chimney Offload (often called TOE by manufacturers) transfers TCP traffic processing from a computer’s CPU to a network adapter that supports TOE. Moving TCP processing from the CPU to the network adapter can free the CPU to perform more application-level functions. TOE can offload the processing for both TCP/IPv4 and TCP/IPv6 connections if the network adapter supports it.

Because of the overhead associated with moving TCP/IP processing to the network adapter, TOE offers the most benefit to applications that have long-lived connections and transfer a lot of data. Servers that perform database replication, function as file servers, or perform backup functions are examples of computers that might benefit from having TOE enabled. Servers with short-lived connections, such as web servers or email servers, might not see any benefit from it.

By default in Server 2008, TOE is disabled. In Server 2008 R2, TOE defaults to a new Automatic mode. You can see if it’s currently set to automatic, enabled, or disabled by using the following command and looking at the resulting output line for Chimney Offload State:

C:\Users\Admin>netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State                    : enabled
Chimney Offload State              : automatic
NetDMA State                           : enabled
Direct Cache Acess (DCA)                     : disabled
Receive Window Auto-Tuning Level        : normal
Add-On Congestion Control Provider: none
ECN Capability                          : disabled
RFC 1323 Timestamps               : disabled

TOE also must be enabled in the network adapter advanced settings, which also lets you control which network adapters use it. Please see your network adapter documentation for more information.

In automatic mode in Server 2008 R2, TOE considers offloading the processing for a connection only if the following criteria are met. This allows TCP Chimney to selectively offload connections, instead of all connections.

  • The connection is established through a 10Gbps Ethernet adapter
  • The mean round-trip link latency is less than 20 milliseconds
  • At least 130KB of data has been exchanged over the connection

You can look at TOE connection details with the Netsh command netsh interface tcp show chimneystats. Note that if you notice extremely slow network performance that’s greatly improved by disabling Chimney, please see the Microsoft article “The SACK option is always set to ‘true’ even if network adapter does not support SACK for offloaded connections in Windows 7 or in Windows Server 2008 R2.


Best-Practice Recommendations

Through trial and error, we’ve established some general guidelines that have been adopted with great success in some customer deployments. For example, following our recommendations, one Microsoft enterprise customer was able to increase its Exchange Server capacity and stability to levels never achieved before. Table 1 provides a list of what is recommended for each server version.

For SNP features, we highly recommend leaving RSS enabled in the OS and network adapter settings. We recommend you leave TCP Chimney set at Automatic for Server 2008 R2 and disabled for Server 2008.

If you’re utilizing NIC Teaming, please use the latest version of the software and follow the manufacturer recommendations for TCP Chimney. Older versions of some NIC Teaming software didn’t work with RSS, but that isn’t a problem with newer versions.

We highly recommend that you leave all other offloads that can be configured in network adapter advanced settings at their default settings (normally Enabled), since disabling them might disable other performance features that depend on them.

Edwards WIN1629 Table1_0
Table 1: RSS and TOE Recommendations for Each Server Version