Apart from offering x86 servers a migration path to 64-bit technology, the Opteron processor from AMD enables glueless eight-way symmetric multiprocessing (SMP). The performance scaling of important commercial applications is challenging above four-way SMP, however, because of the less-than-full interconnection. Interconnect wiring and packaging is severely taxed with an eight-way SMP system. Scaling above an eight-way SMP system requires fixing both these problems. The Horus application-specific IC, to be released in third quarter 2005, offers a solution by expanding Opteron’s SMP capability from eight-way to 32-way, or from 8 to 32 sockets, or nodes. As the “Work on Symmetric Multiprocessing Systems” sidebar shows, many SMP implementations exist, but Horus is the only chip that targets the Opteron in an SMP implementation. In a quad—a four-node Opteron—Horus acts as a proxy for all remote CPUs, memory controllers, and host bridges to local Opteron processors. The chip extends local quad transactions to remote quads and enables requests to remote quads. Key to Horus’s performance is the chip’s ability to cache remote data in its remote data cache (RDC) and the addition of Directory, a cache-coherent directory that eliminates the unnecessary snooping of remote Opteron caches. For enterprise systems, Horus incorporates features such as partitioning; reliability, availability, and serviceability; and communication with the Newisys service processor as part of monitoring the system’s health. In performance simulation tests of Horus for online transaction processing (OLTP), transaction latency improved considerably. The average memory access latency of a transaction in a four-quad system (16 nodes) with Horus running an OLTP application was less than three times the average memory access latency in an Opteron-only system with four Opterons. Moreover, as the number of CPUs per node increased, improvements became even more significant.