To meet the increasing networking needs of server workloads, servers are starting to offload packet processing to peripheral devices to achieve TCP/IP acceleration. Researchers at Intel Labs have experimented with alternative solutions that improve the server's ability to process TCP/IP packets efficiently and at very high rates.
Sincetheintroductionofthe10GbEstandardin2002,theabilityofgeneralpurposeprocessorstoefficientlyprocessnetworktrafficwithcommonprotocolssuchasTCP/IPhasbeenrevisitedandcriticallyevaluated.However,recentcommerciallyavailableprocessorssuchasIntel®CoreTM2DuoProcessor introduce microarchitectural… (More)
Recent I/O technologies such as PCI-Express and 10Gb Ethernet enable unprecedented levels of I/O bandwidths in mainstream platforms. However, in traditional architectures, memory latency alone can limit processors from matching 10 Gb inbound network I/O traffic. We propose a platform-wide method called Direct Cache Access (DCA) to deliver inbound I/O data… (More)
Adoption of the 10 GbE Ethernet standard has been impeded by two important performance-oriented considerations: 1) processing requirements of common protocol stacks and 2) end-to-end latency. The overheads of typical software based protocol stacks on CPU utilization and throughput have been well evaluated in several studies. In this paper, we focus on… (More)
10GbE connectivity is expected to be a standard feature of server platforms in the near future. Among the numerous methods and features proposed to improve network performance of such platforms is Direct Cache Access (DCA) to route incoming I/O to CPU caches directly. While this feature has been shown to be promising, there can be significant challenges… (More)
— With the rapid evolution of network speed from 1Gbps to 10Gbps, a wide spectrum of research has been done on TCP/IP to improve its processing efficiency on general purpose processors. However, most of them did studies only from the performance perspective and ignored its power efficiency. As power has become a major concern in data centers, where servers… (More)
As platforms evolve from employing single-threaded, single-core CPUs to multi-threaded, multi-core CPUs and embedded hardware-assist engines, the simulation infrastructure required for performance analysis of these platforms becomes extremely complex. While investigating hardware/software solutions for server network acceleration (SNA), we encountered… (More)
Scaling TCP/IP receive side processing to 10Gbps speeds on commercialserver platforms has been a major challenge. This led to the development oftwo key techniques: Large Receive Offload (LRO) and Direct Cache Access(DCA). Only recently, systems supporting these two techniques have becomeavailable. So, we want to evaluate these two techniques using 10Gigabit… (More)