Learn More
Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization and manual optimization of programs. Existing reuse distance analysis methods either do not account for the effects of multithreading, or suffer severe performance penalties. This paper presents a sampled,(More)
<italic>Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than previous-generation systems. These deficiencies arise largely(More)
This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The rst part of this paper extends earlier work by studying the impact of current hardware optimizations to memory consistency implementations , hardware-controlled non-binding prefetching and speculative load(More)
This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism. Compared to simulation, the analytical model is many orders of magnitude faster to solve, yielding highly accurate system performance estimates in(More)
Current microprocessors exploit high levels of instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. This paper presents the rst detailed analysis of the impact of such processors on shared-memory multiprocessors using a detailed execution-driven simulator. Using this analysis, we also(More)
This paper explores the hardware and software mechanisms necessary for an efficient programmable 10 Giga-bit Ethernet network interface card. Network interface processing requires support for the following characteristics: a large volume of frame data, frequently accessed frame metadata, and high frame rate processing. This paper proposes three mechanisms(More)
This paper presents two approaches to parallelizing the Snort network intrusion detection system (NIDS). One scheme parallelizes NIDS processing conservatively across independent network flows, while the other optimistically achieves intra-flow parallelism by exploiting the observation that certain intra-flow dependences are uncommon and may be ignored(More)
This paper presents <i>TransFinder</i>, a compile-time tool that automatically determines which statements of an unsynchronized multithreaded program must be enclosed in atomic regions to enforce <i>conflict-serializability</i>. Unlike previous tools, TransFinder requires no programmer input (beyond the program) and is more efficient in both time and space.(More)
This paper presents and evaluates Toast, a scalable Video-on-Demand (VoD)streaming system that combines the popular BitTorrent peer-to-peer (P2P)file-transfer technology with a simple dedicated streaming server to decrease server load and increase client transfer speed. Toast includes a modified version of BitTorrent that supports streaming data delivery(More)