TCP performance re-visited

@article{Foong2003TCPPR,
  title={TCP performance re-visited},
  author={Annie P. Foong and Thomas R. Huff and Herbert H. J. Hum and Jaidev R. Patwardhan and Greg J. Regnier},
  journal={2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003.},
  year={2003},
  pages={70-79}
}
  • A. Foong, T. Huff, +2 authors Greg J. Regnier
  • Published 2003
  • Computer Science
  • 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003.
Detailed measurements and analyses for the Linux-2.4 TCP stack on current adapters and processors are presented. We describe the impact of CPU scaling and memory bus loading on TCP performance. As CPU speeds outstrip I/O and memory speeds, many generally accepted notions of TCP performance begin to unravel. In-depth examinations and explanations of previously held TCP performance truths are provided, and we expose cases where these assumptions and rules of thumb no longer hold in modern-day… Expand
Optimizing TCP Receive Performance
TLDR
Two optimizations, receive aggregation and acknowledgment offload, are presented that improve the receive side TCP performance by reducing the number of packets that need to be processed by the TCP/IP stack. Expand
An in-depth analysis of the impact of processor affinity on network performance
  • A. Foong, Jason M. Fung, D. Newell
  • Computer Science
  • Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955)
  • 2004
TLDR
This work presents a full experimental-based analysis of TCP performance under various affinity modes on SMP servers using mechanisms and interfaces provided by the Redhat Linux-2.4.20 distribution to understand the causes behind the gains. Expand
Architectural Characterization of Processor Affinity in Network Processing
TLDR
An experimental study of TCP performance under various affinity modes on IA-based servers showed that interrupt affinity alone provided a throughput gain of up to 25%, and combined thread/process and interrupt affinity can achieve gains of 30%. Expand
Optimizing Latency in Beowulf Clusters
TLDR
This work contributes with a systematic approach to optimize communication latency, provided with a detailed checklist and procedure and found that after applying dierent techniques the default Gigabit Ethernet latency can be reduced from about 50 s into nearly 20 s. Expand
Performance characterization of TCP/IP packet processing in commercial server workloads
  • S. Makineni, R. Iyer
  • Computer Science
  • 2003 IEEE International Conference on Communications (Cat. No.03CH37441)
  • 2003
TLDR
This paper analyzes the impact of NIC features such as Large Segment Offload and the use of Jumbo frames on TCP/IP packet processing performance on Intel's state-of-the-art low-power Pentium/spl reg/ M microprocessor running the Microsoft Windows* Server 2003 operating system. Expand
Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor
  • S. Makineni, R. Iyer
  • Computer Science
  • 10th International Symposium on High Performance Computer Architecture (HPCA'04)
  • 2004
TLDR
An in-depth analysis of packet processing performance on Intel's state-of-the-art low power Pentium/spl reg/ M microprocessor running the Microsoft Windows* Server 2003 operating system finds that the mode of TCP/IP operation can significantly affect the performance requirements. Expand
Improving Server Application Performance via Pure TCP ACK Receive Optimization
TLDR
This paper proposes a simple kernel-level optimization which reduces perpacket overhead through fewer memory allocations and a simplified code path, and demonstrates cycles savings of 15% in a Web application, and 33% throughput improvement in reliable multicast. Expand
Architectural Breakdown of End-to-End Latency in a TCP/IP Network
TLDR
It is demonstrated that application level end-to-end one-way latency with a 10GbE connection can be as low as 10 μs for a single isolated request in a standard Linux network stack. Expand
Evaluating network processing efficiency with processor partitioning and asynchronous I/O
TLDR
Detailed analysis shows that the efficiency advantage of the ETA+AIO prototype, which uses one PPE CPU, comes from avoiding multiprocessing overheads in packet processing, lower overhead of the authors' AIO interface compared to standard sockets, and reduced cache misses due to processor partitioning. Expand
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
TLDR
An in-depth evaluation of the various aspects of the TCP/IP protocol suite including the memory traffic and CPU requirements, and compare these with RDMA capable network adapters, using 10Gigabit Ethernet and InfiniBand as example networks shows that the RDMA interface requires up to four times lesser memory Traffic and has almost zero CPU requirement for the data sink. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
The importance of non-data touching processing overheads in TCP/IP
TLDR
It is asserted that it will be difficult to significantly reduce the cumulative processing time due to non-data touching overheads when one considers realistic message size distributions, where the majority of messages are small. Expand
End system optimizations for high-speed TCP
TLDR
This article surveys the most important of a variety of optimizations above and below the TCP protocol stack and illustrates their effects quantitatively with empirical results from an experimental network delivering up to 2 Gb/s of end-to-end TCP bandwidth. Expand
An analysis of TCP processing overhead
TLDR
A detailed study was made of the Transmission Control Protocol (TCP), the transport protocol from the Internet protocol suite, and it was concluded that TCP is in fact not the source of the overhead often observed in packet processing, and that it could support very high speeds if properly implemented. Expand
The importance of non-data touching processing overheads in TCP/IP
We present detailed measurements of various processing overheads of the TCP/IP and UDP/IP protocol stacks on a DECstation 5000/200 running the Ultrix 4.2a operating system. These overheads include ...
Linux IP Networking: A Guide to the Implementation and Modification of the Linux Protocol Stack
TLDR
This document is a guide to understanding how the Linux kernel implements networking protocols, focused primarily on the Internet Protocol, and is intended as a complete reference for experimenters with overviews, walk-throughs, source code explanations, and examples. Expand
Hyper-Threading Technology: Impact on Compute-Intensive Workloads
TLDR
The performance of an Intel Xeon processor enabled with Hyper- Threading Technology is compared to that of a dual Xeon processor that does not have HyperThreading Technology on a range of compute-intensive, data-parallel applications threaded with OpenMP. Expand
Linux Device Drivers
TLDR
This book discusses the role of the Device Driver, the Kernel Classes of Devices and Modules, and more about how Mounting and Unmounting works. Expand
The Virtual Interface Architecture
This protected, zero-copy, user-level network interface architecture reduces the system overhead for sending and receiving messages between high-performance CPU/memory subsystems and networks to lessExpand
TCP Offload Engines Finally Arrive
  • 2002
TCP Offload Engines Finally Arrive. Storage Magazine
  • TCP Offload Engines Finally Arrive. Storage Magazine
  • 2002
...
1
2
...