Not terribly long ago, you might have associated 64-bit processors with only the very highest-end systems. The exorbitant cost and limited application set of 64-bit CPUs relegated them to those few scenarios in which their benefits could shine. The ability of the 64-bit platform to take advantage of memory beyond the 4GB barrier imposed by the 32-bit architecture made it a great fit for huge-scale database applications and graphical-rendering applications that need access to prodigious memory and intensive data-throughput capabilities.

Nowhere do things change faster than in the world of technology. With the introduction of new 64-bit processors from AMD and Intel, as well as Microsoft's new 64-bit versions of Windows—in combination with the quick adoption of the new 64-bit architecture by first-tier PC manufacturers such as Dell, HP, and IBM—it's now a real possibility that you'll be implementing 64-bit systems as servers and even desktop systems before the year is out. To get an idea how such high-end systems might fit into your organization, let's take a look at the differences between today's selection of 64-bit processors and delve into their major features.

Intel's Itanium2
Released in May 2001, Intel's Itanium was the first true 64-bit processor for the PC platform since the demise of Digital Equipment Corporation's (DEC's) 64-bit Alpha. Microsoft has supported the Itanium since its inception. For more information about Windows support for the Itanium, see the sidebar, "64-bit Windows—There and Back Again," page 60. The second version of the processor—the Itanium2—has proven its high-end scalability by having at various times held the top nonclustered TPC-C score and now holds five of the top 10 TPC-C spots, in addition to the top SAP Sales and Distribution (SD) Users benchmark.

Itanium's architecture, dubbed IA-64 by Intel, is radically different from the 32-bit x86 design. The 32-bit x86 uses the CISC architecture, in which the processor is designed to process a series of complex instructions within each clock cycle. The ability to execute multiple instructions makes the processor more efficient but also complicates the design of the processor because the processor must intelligently predict and optimize the code that will be executed. The IA-64 uses a different instruction set, the effectiveness of which is ultimately dependent on new compiler technology. Itanium is a Very Long Instruction Word (VLIW) processor, which reads instruction strings or "words" that are composed of multiple instructions combined together. Several specialized, single-purpose CPUs have used the VLIW architecture, but the Itanium is its first implementation as a general-purpose microprocessor.

Architecturally, the Itanium more closely resembles a RISC processor than it does an x86 processor. However, one big difference between the Itanium and the current technology that modern RISC processors use is the utilization of enhanced parallel-processing techniques. Processing instructions in parallel is a task that most RISC systems do poorly. The Itanium can process as many as six instructions in parallel per clock cycle. Intel's name for the Itanium's parallel-processing design is Explicitly Parallel Instruction Computing (EPIC). The ability to execute multiple parallel instructions per cycle makes traditional speed measurements based solely on clock speed misleading for the Itanium processor.

EPIC removes the need for the CPU to perform complex out-of-order processing, as is the case with the x86 architecture. Instead, the compiler—rather than the CPU—performs the job of parallelizing the machine instruction scheme up front. The compiler creates the executable instructions that the processor performs and, in the case of the Itanium, must determine the dependencies of each instruction as well as the instructions that can run in parallel. Theoretically, the compiler knows more about the code than the CPU and can make better estimates about upcoming code paths. This design promises to make the processor more efficient and eliminates the need for an embedded instruction scheduler. However, the success of systems built on this design is largely dependent on the ability of the compiler to optimize that code for parallel processing.

Moving the Itanium away from the x86 architecture eliminates the floating-point weakness that has long plagued the x86 family (x86 systems have always lagged far behind RISC processors in their ability to process floating-point calculations). The move also provides ample headroom for future performance improvements. However, a significant problem materializes: 32-bit compatibility. To run existing 32-bit applications, the Itanium provides x86 hardware emulation that's designed to provide compatibility with the 32-bit x86 instruction set. This emulation permits existing 32-bit binary programs to run on IA-64 Itanium systems without requiring changes to the application. Unfortunately, as with the old 64-bit DEC Alpha platform, Itanium's x86 emulation imposes a significant performance overhead, causing 32-bit applications to run significantly slower on the 64-bit Itanium than on a native 32-bit x86 system. This tradeoff clearly shows that Itanium was designed for the high-end server market, in which compatibility with existing desktop software isn't a high priority. Figure 1 presents an overview of Itanium's 64-bit and 32-bit application compatibility, focusing on the software layers that an Itanium processor uses to handle 32-bit code. To natively support 64-bit applications, the native 64-bit Itanium processor uses native 64-bit device drivers in conjunction with Windows Server 2003 for 64-Bit Itanium-based Systems. However, 32-bit applications need the addition of a 32-bit emulation layer, which essentially translates 32-bit x86 CISC instructions to 64-bit RISC instructions.

Web Table 1 (http://www.windowsitpro.com, InstantDoc ID 44706) presents an overview of the Itanium2's major technical specifications and offers a comparison against the rival set of today's x86 64-bit processors. The current Itanium2 systems run at a maximum speed of 1.5GHz, but don't be fooled by that number. Itanium's EPIC architecture lets the system achieve performance levels that are much higher than that of a 1.5GHz x86 system. Using large CPU caches gives the Itanium another big performance boost. The latest Itanium2 models can use as much as 6MB of Level 3 CPU cache. Future Itanium2 designs will be multicore, in which two CPU cores reside in a single die. The multicore Itanium2 is expected in 2005.

The bottom line. The Itanium2 is the processor of choice for high-performance computing scenarios. The Itanium2 system is designed for very large-scale database implementations, server-consolidation scenarios, OLAP and Business Intelligence solutions, and processor-intensive graphical workstations—all running native 64-bit IA-64 applications. The benchmarks clearly back up the assumption of high-end computing scenarios. However, the Itanium2 isn't designed for 32-bit compatibility. Although it's capable of running 32-bit x86 applications, mission-critical 32-bit applications are best run on either a 32-bit system or one of the newer x86-64 processors, which I talk about next.

AMD's Opteron and Athlon 64
AMD has set sail on an entirely different course into the waters of 64-bit computing. Unlike the Itanium, which moved away from the x86 instruction set to the new IA-64 architecture, AMD's x86-64 architecture is a logical extension of the x86-32 architecture that the vast majority of today's 32-bit systems use. AMD termed this new 64-bit architecture AMD64, but it's now commonly referred to as x86-64 or x64. To update the x86 architecture to 64-bit, AMD reworked the existing x86 architecture by extending all general-purpose registers to 64 bits and adding eight new general-purpose registers, thereby doubling the number of general-purpose registers to 16.

AMD's first x64 processor was the Opteron, which the company released in April 2003. Several Linux distributions immediately supported the new 64-bit processor, but Microsoft support was slow in coming. For more information about Microsoft's support of the AMD x86-64 platform, see the sidebar, "64-Bit Windows: There and Back Again," page 60. The Opteron is a server-oriented Symmetrical Multiprocessing (SMP)-capable system. AMD followed the release of the Opteron with the desktop-oriented Athlon 64. The Athlon 64 uses the same AMD64 architecture that the Opteron uses but doesn't accommodate multiple CPU configurations.

Like the 32-bit x86 architecture but very different from the Itanium, AMD's x64 processor is CISC-based, which means the CPU performs its own code optimizations and scheduling. AMD addressed the x86 floating-point weaknesses by incorporating new floating-point instructions and 16 new floating-point registers. Perhaps the biggest differentiator between the AMD64 platform and the Intel IA-64 platform is in the arena of 32-bit application compatibility. The AMD64 platform can run both 32-bit and 64-bit applications side-by-side with 100 percent efficiency. In fact, thanks to the larger data path that the AMD64 architecture provides, 32-bit applications typically perform better on an AMD64 system than they do on a native x86-32 system.

The AMD64 architecture enables full 32-bit compatibility and native 64-bit operations by supporting two operating modes: Long mode and Legacy mode. Both modes provide binary compatibility with existing 16-bit and 32-bit applications. Long mode is designed to run native 64-bit OSs, such as Microsoft's upcoming Windows 2003 for 64-Bit Extended Systems. Long mode supports only native 64-bit OSs. A processor running in Long mode uses native 64-bit addresses and a flat 64-bit address space. To take advantage of the 64-bit extensions, application providers must recompile applications with a 64-bit compiler. Any 32-bit applications that run in Long mode can access only the first 32 bits of the available registers.

Legacy mode can run existing 32-bit or 64-bit OSs. Legacy mode permits existing 16-bit DOS and 32-bit Windows OSs to run unchanged on the 64-bit processor. Running in Legacy mode, 32-bit applications run exactly as if they were running on a 32-bit processor and continue to use 32-bit addresses and registers. Legacy mode provides full binary compatibility with existing 32-bit applications. This ability to run both 64-bit and 32-bit applications shows that the strength of the AMD64 platform is its versatility. Although it lacks the Itanium's high-end scalability, the AMD64 platform provides the primary benefits of 64-bit computing without sacrificing 32-bit performance. For an overview of the AMD64 platform's application compatibility, see Figure 2. In this figure, you can see that the AMD64 processor can either use 32-bit drivers, OS, and applications or use native 64-bit drivers, OS, and applications. The left side of the figure precisely illustrates the native 32-bit systems that are common today. The right side illustrates the Itanium with one important exception: Because the processor is an x86 CISC processor, no emulation is required to run 32-bit applications. These applications run at full speed. In the case of the x64 architecture, the design goal is compatibility, not optimum performance.

Although the AMD64 platform is an extension of the x86 architecture, the systems have been designed from the outset for 64-bit computing. You can view the primary technical specifications of the AMD Opteron and Athlon 64 processors in Web Table 1. One of the most important features of the AMD64 platform is the use of what AMD calls Direct Connection Architecture, in which memory, I/O, and additional CPUs are directly connected to the CPU for maximum throughput. These connections are enabled by AMD's new HyperTransport bus design, an overview of which you'll find in Figure 3. Notice in the figure that the HyperTransport bus connects directly to the CPU, giving it a high-speed link to the PCI channels, system I/O, and peripheral interconnects. AMD's HyperTransport bus accommodates a throughput bandwidth of up to 6.4GBps per connection.

The bottom line. The AMD Opteron is a competitor to Intel's Xeon line and provides performance that's compatible to that of the top-of-the-line Xeon processors, but with the added ability to easily migrate to a native 64-bit platform. Although the Opteron is a pure server processor, the Athlon 64 is a Pentium competitor designed for the desktop market. If you're looking for high-performance desktop systems, Athlon 64-based systems should be at the top of your list. Althon 64 systems are available in both desktop and mobile systems and are priced about the same as top-of-the-line 32-bit systems. They're currently the system of choice for hard-core gamers looking for the maximum system performance.

Intel's Xeon and Pentium with EM64T Technology
With little fanfare, Intel released its latest processor—the 64-bit Xeon—in July 2004, then quickly followed that release with the 64-bit version of the Pentium 4 in August 2004. Intel, an understandably staunch supporter of the Itanium chip, was hesitant to follow AMD's lead with an extended x86 architecture. In fact, the latest press release for the 64-bit compatible Pentium came from IBM as a part of the rollout of its new 64-bit lineup. Although Intel has been reluctant to enter the x64 market, the versatility of the x64 platform and its aggressive pricing—especially in comparison with the Itanium—forced the company down the x64 path.

Following the lead of the AMD64 platform, the new Intel Xeon uses the x86-64 architecture, and its instruction level is compatible with the AMD Opteron and Althon 64 processors. Intel terms its new 64-bit technology Extended Memory 64 Technology (EM64T). As with the AMD64 platform, the Windows 2003 for 64-Bit Extended Systems OS will support Intel's EM64T technology.

Like the 32-bit x86 and AMD64 platforms, Intel's EM64T processors are CISC-based and offer complete 32-bit binary compatibility. Also, like the AMD64 platform, the EM64T provides a Long mode for running native 64-bit OSs and a Legacy mode for running 32-bit OSs. The capabilities of these modes are the same as AMD's. However, there are definite differences between systems using the EM64T architecture and the AMD64 platform. Processor-wise, Intel's new 64-bit Xeon has several features that differentiate it from AMD's 64-bit line. Intel's new 64-bit Xeon processors can dynamically adjust the processor's power use through a technology called Demand Based Switching with Enhanced Intel SpeedStep Technology. Both the Xeon and the new Pentium 4 also provide support for Hyper-Threading. Hyper-Threading essentially lets the system see one processor as two, thereby enabling execution of more simultaneous threads for improved overall system performance. (Don't confuse Hyper-Threading with the new HyperTransport bus that the AMD processor line uses. The Intel EM64T systems don't use the higher-performance HyperTransport bus.) In addition, with its new Xeon line, Intel is providing the new E7520 and E7320 chipsets, and the Pentium 4 EM64T system uses the new 925X Express chipset, which supports extended memory and PCI Express.

You can view the new Intel EM64T specifications in Web Table 1. The new Xeon chips currently run at 3.6GHz and 2.8GHz, and they're developed through a new .9-micron manufacturing process, which means they consume less power and therefore run cooler. One point to notice in Web Table 1 is that the Intel EM64T line uses a smaller addressing section, therefore limiting overall memory address ability.

The bottom line. The new Intel Xeon EM64T, a server-oriented processor that competes squarely with the AMD Opteron, is basically the fastest and most capable Xeon processor that Intel has yet made. This processor will be an appealing option for companies that have standardized on Intel processors. However, the chip lacks both the higher-performance Direct Connection Architecture that the Opteron uses and the superior HyperTransport interconnect design, so the Xeon's performance is limited as a native 64-bit processor but—like the Opteron—fully supports both 32-bit and 64-bit applications at full speed. Intel is marketing its Pentium EM64T chip as a server and high-end workstation CPU, and at the time of this writing, it hasn't yet entered the desktop market. However, given a little time, it's sure to evolve into a desktop competitor to the Athlon 64.

Time to Go 64-Bit?
At this year's Windows Hardware Engineering Conference (WinHEC) in Seattle, Bill Gates predicted that by the end of 2005, all the x86 server processors sold by AMD and the majority of the processors sold by Intel will be 64-bit processors. Although migrating the huge x86-32 customer base will take time—and our survey, which you can read about in the Web-exclusive sidebar "Are You Using 64-Bit Systems?" (InstantDoc ID 44724) verifies this—Microsoft clearly sees the x86-64 platform as the mainstream computing platform of the future. The versatility of the x64 platform—with its ability to run 32-bit applications as well as 64-bit applications, combined with native 64-bit Windows support—will soon push 64-bit computing onto desktops near you. The x64 platform is aimed at the desktop and consumer market, but don't make the mistake of thinking that Intel or Microsoft have abandoned the Itanium. The Itanium system's high performance lets it compete head-on with RISC-based UNIX systems and mainframe systems. For maximum performance and scalability, Itanium is still the CPU of choice, and the emergence of the x64 platform hasn't changed that.