Two rivals take DIFFERENT PATHS to the next generation of computing

Extending their 32-bit rivalry into the next generation of computing, Intel and AMD plan to release new 64-bit x86-compatible processors in 2001. Like the jump in processing power we saw when the PC platform evolved from the 16-bit 286 to the 32-bit 386 processor, the jump to 64 bits promises to take PC technology to new heights on the enterprise ladder. Although 64-bit processors won't make your Microsoft Excel spreadsheets recalculate faster or speed up most other desktop applications, the new processors will address the perpetual need for more processing power in computing's upper tiers. High-end graphics workstations and large database systems, such as Microsoft SQL Server and Oracle, will benefit most directly from the new processors. Dot-com stores and the increasing number of decision support and data warehousing applications are typically the driving forces behind massive database growth. Such applications will derive less benefit from increases in raw processor speed than they will from increased memory addressability. Database applications in particular are infamous for being RAM-hungry; the more memory those applications have, the better they perform.

Today's crop of 32-bit processors can natively address up to 4GB (232 bytes) of data. Windows 2000 Server reserves 2GB of a 32-bit processor's storage for its own use, leaving 2GB for applications. Enterprise Management Architecture (EMA), which Win2K Advanced Server and Win2K Datacenter Server support, provides two methods of extending the amount of RAM available for applications: 4GB RAM Tuning (4GT) and Physical Address Extension (PAE). 4GT adds the /3GB switch to the Advanced RISC Computing (ARC) path in the boot.ini file to let applications address as much as 3GB of RAM. PAE uses a window to map chunks of physical memory to an application's virtual address space and extend physical memory addressability to 8GB on Win2K AS and to 64GB on Datacenter. (For more information about Datacenter's EMA support, see Greg Todd, "Win2K Datacenter Server," page 49.)

The upcoming 64-bit processors will dramatically extend the amount of addressable physical memory available to high-end systems. Intel's and AMD's 64-bit processors will raise the bar to a staggering 16 exabytes (EB), or roughly 18 billion gigabytes (264 bytes)—more than enough headroom for even the most massive of today's applications.

Sixty-four-bit processors actually have two important capabilities. In addition to being able to use 64 bits to define a memory address, these processors can manipulate 64 bits of data simultaneously. Because the ability to manipulate 64 bits of data at once is as much a function of the bus structure as the processor, significant advances in system bus technology go hand in hand with the move to 64-bit processing.

At the Crossroads
Although they share a clear goal, AMD and Intel have chosen quite different paths to the destination. In a move that might seem surprising, Intel plans to abandon its flagship x86 architecture in favor of the new and radically different IA-64 architecture that Intel codeveloped with Hewlett-Packard (HP). The IA-64 architecture introduces a different instruction set and is based on a much more sophisticated and complex design whose effectiveness ultimately depends on new compiler technology.

In contrast, AMD plans to extend the x86 architecture into a new design known as x86-64. AMD's new architecture is a logical and simple extension of the current x86-32 instruction set architecture that all x86-based processors use.

These competing approaches will have a tremendous impact on the transition of Win2K and the upcoming Windows.NET platforms to 64-bit technology. Let's look in more detail at the different routes Intel and AMD have chosen.

Intel Takes the High Road
Intel has officially dubbed its 64-bit processor the Itanium (formerly code-named Merced). Intel expects to release the Itanium in the first half of 2001 and will target it as a replacement for the Pentium III Xeon processor in systems that are used primarily as servers and occasionally as very-high-end workstations. Industry observers expect the initial Itanium version to run at 733MHz, which might seem a bit disappointing considering that current Pentium systems already run at speeds well in excess of 1GHz. However, the Itanium's radically different architecture makes traditional speed comparisons a bit like comparing apples and oranges.

Although the Itanium's initial speeds will be more modest, Intel designed the Itanium to be capable of 6GFLOPS (i.e., 6 billion floating-point operations per second). The Itanium will have four integer units and two floating point units. The processor package, which will be about the size of a 3" * 5" index card, will have 32KB of L1 cache and 96KB of L2 cache on the chip and will be able to access up to 4MB of outboard L3 cache. OEMs also will be able to add L4 cache. The Itanium will have as many as 128 registers to store numbers and instructions, and Intel will use a 0.18 micron die-set to build the processor. The processor will use a new Slot M motherboard interface, and the front-side bus will run at 266MHz.

The Itanium is a Very Long Instruction Word (VLIW) processor. VLIW processors read instruction strings (aka words) that consist of a combination of multiple instructions. Manufacturers use the VLIW architecture in several specialized single-purpose CPUs, but VLIW has never before been used in a general-purpose microprocessor.

Moving the Itanium away from the x86 architecture eliminates the floating- point weaknesses that plague the x86 family. In its design, the Itanium more closely resembles a high-end RISC processor than it does the x86. However, unlike modern RISC processors, the Itanium uses enhanced parallel-processing techniques. Don't confuse the Itanium's type of parallel processing with the parallel processing that multiprocessor SMP systems such as the Xeon use. The Itanium's type of parallelism refers to the CPU's ability to process more than one instruction at a time—a task most RISC systems do poorly. Intel's name for this ability is Explicitly Parallel Instruction Computing (EPIC).

The EPIC architecture will be able to process in parallel up to six instructions per clock cycle. The ability to execute multiple instructions per cycle makes traditional speed measurements, which are based solely on clock speed, misleading for the Itanium processor. The Itanium will probably herald a new CPU performance measurement based on instructions per cycle. EPIC eliminates the need to implement complex Pentium-style out-of-order processing to optimize speed. Instead, the Itanium hands to the compiler the job of parallelizing machine instructions. The compiler reads in the program source code and creates executable instructions, which the processor performs. The compiler must determine the dependencies of each instruction as well as which instructions the Itanium should run in parallel. This architecture promises to make the new processor simpler without requiring it to have an instruction scheduler or hidden registers. However, EPIC also depends largely on the compiler's ability to optimize code for parallel processing.

EPIC is closely related to another Itanium feature called prediction. Prediction is a compiler-based technique of looking ahead in the code to predict which code branches will actually be used. In modern processors such as the Pentium, the processor spends a portion of its time calculating which code branches the program is likely to perform next. Compiler-based prediction more accurately predicts which branches will be used than does processor-based prediction, thus reducing unneeded calculations and letting the processor operate more efficiently.

Speculation is another new capability that lets the Itanium load instructions and data into the CPU before they're actually needed, a technique that in effect uses the processor as a cache. By letting the processor load data before it's needed, speculation limits the effects of memory latency. Proactive loading also lets the processor execute instructions instantly as soon as it needs them.

With an eye toward the high-end supercomputing platform, Intel designed Itanium to support up to 512-way SMP servers. The Itanium's system bus, which implements a technology that Intel terms a Multidrop system bus, runs at 2.1Gbps to speed interprocessor communication.

Itanium's 32-Bit Penalty Box
To let the Itanium run existing 32-bit applications, Intel will provide the new processor with x86 hardware emulation for full compatibility with existing 32-bit instruction sets. This emulation will let existing 32-bit programs run without changes on Itanium-powered systems.

However, don't assume that your existing 32-bit applications will run faster on the Itanium. On the contrary, the Itanium's emulation imposes significant overhead by converting x86 instructions into equivalent IA-64 instructions. Obviously, the emulation process will also forgo Itanium's EPIC processing capabilities. An Itanium system will almost certainly run 32-bit applications more slowly than will a comparable Pentium or AMD Athlon system. Ultimately, all 32-bit applications will need to be recompiled on a 64-bit Itanium-compatible compiler to be able to take advantage of the Itanium's sophisticated new features. This trade-off clearly shows that Intel has designed the Itanium for the high-end server market, where compatibility with existing desktop software isn't a high priority.

AMD Takes the Low Road
Although Intel's Itanium chip has its sights set firmly on the server market's high end, AMD's upcoming 64-bit, eighth-generation processor (code-named Sledgehammer) targets the low and middle tiers of the PC server market as well as the market for high-powered desktops and workstations. Firmly embracing the existing x86 instruction set, AMD plans to extend that architecture into the 64-bit realm through its x86-64 architecture. Although AMD hasn't announced the clock speeds of the first Sledgehammer systems (which AMD is building on the AMD Athlon core), industry observers expect Sledgehammer to run at speeds comparable to those of the current AMD Athlon and expect the first production systems to run at speeds well over 1GHz. AMD plans to release the first Sledgehammer processors in the second half of 2001.

The Sledgehammer processor embodies an evolution of the current AMD Athlon design. Unlike the Itanium and its completely new VLIW architecture, which shifts most code optimization to the compiler, Sledgehammer remains a CISC-based processor and performs its own code optimization and scheduling. Sledgehammer's unique design will incorporate two processors on a single die. AMD will address the x86 floating-point weaknesses by incorporating new floating-point instructions and 16 floating-point registers. Sledgehammer will also add eight new general-purpose registers (GPRs) to the CPU, doubling the number of GPRs to 16, and will broaden all GPRs to 64 bits.

The Sledgehammer CPU will likely have 128KB of L1 cache and 512KB to 2MB of L2 cache running at full processor speed. Sledgehammer's front-side bus will run at 266MHz. With AMD migrating from a Slot A to a Socket A motherboard connector, Sledgehammer systems probably will use the new Plastic Pin Grid Array (PPGA) motherboard interface. AMD will likely manufacture the Sledgehammer processor in its Dresden, Germany, facility and will incorporate a new 0.13-micron die size and copper interconnections in the manufacturing process. For more information about interconnections and the significance of processor size and speed, see the sidebar "Smaller and Faster."

Sledgehammer Nails 32-Bit Compatibility
To enhance the x86 architecture, Sledgehammer will add two operating modes: Long mode and Legacy mode. Long mode will run with native 64-bit OSs, and Legacy mode will run with 32-bit or 64-bit OSs. Both modes provide binary compatibility with existing 16-bit and 32-bit applications. Table 1 compares Sledgehammer's operating modes.

Only native 64-bit OSs will be able to use the Long mode's 64-bit mode, which uses native 64-bit addresses and a flat 64-bit address space. Developers will need to recompile applications to take advantage of the 64-bit extensions. The Long mode's compatibility mode will let legacy 16-bit and 32-bit applications run unchanged under a native 64-bit OS. Unsurprisingly, however, applications will run under Compatibility mode exactly as if they were running on a 32-bit processor and will continue to use 32-bit addresses and registers. Long mode supports only x86 protected mode.

Legacy mode runs 32-bit OSs and is fully binary-compatible with existing 32-bit applications. Legacy mode supports all existing x86 processor modes—including x86 real mode, virtual 8086 mode, and protected mode—and dynamically switches modes as necessary.

To accommodate its increased speeds, Sledgehammer will introduce the Lightning Data Transport (LDT) system bus, which AMD developed jointly with Alpha Processor Incorporated (API). The CPU uses the LDT bus for processor interconnects, PCI channels, system I/O, and peripheral interconnects. The LDT bus provides bandwidth of as much as 6.4GBps per connection—more than a twentyfold increase over today's 266MBps system interconnects.

The Race for 64-Bit Gold
The Itanium represents a major change in Intel's direction and a significant move into the high-end-server and workstation markets that RISC-based systems currently rule. The Itanium's advanced design will lay the groundwork for the next generation of high-end computing. However, because the processor requires all-new 64-bit software to realize its advanced capabilities, the Itanium technology is likely to take some time to make its way into the mainstream. In the meantime, AMD's Sledgehammer processor will likely appear to be the best 64-bit processor to use with existing 32-bit computing platforms. Sledgehammer won't be only a fast 64-bit processor but probably also the fastest processor for 32-bit applications. This speed should give Sledgehammer quick entry into the high-end workstation market.

However, the question of 64-bit software compatibility still looms. To realize their potential, both processors need new 64-bit OS and application software. The Itanium in particular depends on new compiler technology that can optimize code for the VLIW processor. Although AMD's Sledgehammer doesn't share that weakness and will run well with existing 32-bit executable programs, moving the processor to Long 64-bit mode will require new 64-bit software. And Sledgehammer's 64-bit x86-64 architecture uses a different binary image than does the Itanium's 64-bit IA-64. Consequently, software vendors will either need to support both platforms or be forced to choose which type of 64-bit platform to support.

Considering Microsoft's close ties with Intel, it should come as no surprise that Microsoft is squarely in the Itanium camp. Microsoft will develop the 64-bit version of Windows for the IA-64 instruction set and has already delivered an IA-64 software development kit (SDK). So far, Microsoft has made no public statement about support for AMD's Sledgehammer technology. However, Sledgehammer has gained the support of the Linux development community, and a GNU/Linux port is already in development.

Based on the announced release dates, Intel will probably win the first-to-market race, but reliance on improved software technology could delay development of applications that take advantage of the Itanium's features. In addition, rumors suggest that AMD's Sledgehammer is ahead of schedule and might reach production sooner than expected.

More Information
Several Web sites offer more information about these two 64-bit technologies. Visit http://www.intel.com/ebusiness/products/ia64 to learn about Intel's 64-bit plans, and go to http://www.amd.com/products/cpg/64bit for information about AMD's 64-bit implementation plans. Another site that provides insight into today's processor technology is http://www.tomshardware.com.