• Corpus ID: 1328434

When poll is better than interrupt

  title={When poll is better than interrupt},
  author={Jisoo Yang and Dave B. Minturn and Frank T. Hady},
In a traditional block I/O path, the operating system completes virtually all I/Os asynchronously via interrupts. However, performing storage I/O with ultra-low latency devices using next-generation non-volatile memory, it can be shown that polling for the completion - hence wasting clock cycles during the I/O - delivers higher performance than traditional interrupt-driven I/O. This paper thus argues for the synchronous completion of block I/O first by presenting strong empirical evidence… 

Figures and Tables from this paper

Effective I/O Processing with Exception-Less System Calls for Low-Latency Devices

  • M. NakajimaS. Oikawa
  • Computer Science
    2015 Third International Symposium on Computing and Networking (CANDAR)
  • 2015
This work proposes the novel approach to completes I/O requests effectively with exception-less system calls, which do not require software interrupts in issuing system calls.

I/O Is Faster Than the CPU: Let's Partition Resources and Eliminate (Most) OS Abstractions

This work proposes a structure for an OS called parakernel, which eliminates most OS abstractions and provides interfaces for applications to leverage the full potential of the underlying hardware.

Optimizing Storage Performance with Calibrated Interrupts

This work proposes addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive, and shows that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to thedevice interface.

Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory

This work presents new cooperative schemes including software and hardware to address performance issues with deploying storage-class memory technologies as a storage device, including a new polling scheme called dynamic interval polling and a pipelined execution between storage device and host OS called pipelining post I/O processing.

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs

This work prototype the proposed asynchronous I/O stack on the Linux kernel and evaluate it with various workloads, demonstrating that the application-perceived I/o latency falls into single-digit microseconds for 4 KB random reads on Optane SSD, and the overall I/ O latency is reduced by 15–33% across varying block sizes.

HyPI: Reducing CPU Consumption of the I/O Completion Method in High-Performance Storage Systems

This work proposes an enhanced scheme, called HyPI, which consumes fewer CPU resources along with the reasonable performance and shows that HyPI achieves 87.98% lower CPU consumption than that of the polling-based I/O completion method with a negligible performance drop.

OS I/O Path Optimizations for Flash Solid-state Drives

Evaluations with micro-benchmarks showed that the OS I/O path optimizations aimed to minimize scheduling delays caused by additional contexts such as interrupt bottom halves and background queue runs were capable of accommodating up to five, AHCI controller attached, SATA 3.0 SSD devices at 671k IOPS.

Beyond block I/O: implementing a distributed shared log in hardware

This work proposes an interface to networked storage that reduces an existing software implementation of a distributed shared log to hardware and achieves both scalable throughput and strong consistency, while obtaining significant benefits in cost and power over the software implementation.

I/O Speculation for the Microsecond Era

How speculation can address the challenges that microsecond scale devices will bring is surveyed, applications for the potential benefit to be gained from speculation and several classes of speculation techniques are examined.

Reading from External Memory

This work presents a detailed overview and evaluation of modern storage reading performance with regard to available Linux synchronous and asynchronous interfaces and measures latency and CPU usage.



An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+

  • David SitskyK. Hayashi
  • Computer Science
    Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96)
  • 1996
A complete implementation of MPI for the Fujitsu AP1000+ is presented and exhibits good performance compared to the native message passing library, and allows the user to decide at runtime which mechanisms will be used in order to achieve the best performance on a per-application basis.

Better I/O through byte-addressable, persistent memory

A file system and a hardware architecture that are designed around the properties of persistent, byteaddressable memory, which provides strong reliability guarantees and offers better performance than traditional file systems, even when both are run on top of byte-addressable, persistent memory.

Towards SSD-Ready Enterprise Platforms

It is found that the majority of platform I/O latency still lies in the SSD and not in system software, and data copies, uncacheable MMIO reads, interrupt processing, and context switches to be the primary contributors ofI/O processing cost.

SCMFS: A file system for Storage Class Memory

  • XiaoJian WuSheng QiuA. Reddy
  • Computer Science
    2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
  • 2011
This paper proposes a new file system, called SCMFS, which is implemented on the virtual address space, which utilizes the existing memory management module in the operating system to do the block management and keep the space always contiguous for each file.

Linux device drivers

Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories

The architecture of a prototype PCIe-attached storage array built from emulated PCM storage called Moneta, which provides a carefully designed hardware/software interface that makes issuing and completing accesses atomic and explores trade-offs in Moneta's architecture between performance, power, memory organization, and memory latency.

RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

This paper proposes several mechanisms to exploit RDMA Read and selective interrupt based asynchronous progress to provide better computation/communication overlap on InfiniBand clusters and indicates that the designs have a strong positive impact on scalability of parallel applications.

Understanding the Linux Kernel

This edition of Understanding the Linux Kernel covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices.

Scalable high performance main memory system using phase-change memory technology

This paper analyzes a PCM-based hybrid main memory system using an architecture level model of PCM and proposes simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.