• Corpus ID: 2020699

An Evaluation of Coarse-Grained Locking for Multicore Microkernels

  title={An Evaluation of Coarse-Grained Locking for Multicore Microkernels},
  author={Kevin Elphinstone and Amirreza Zarrabi and Adrian Danis and Yanyan Shen and Gernot Heiser},
The trade-off between coarse- and fine-grained locking is a well understood issue in operating systems. Coarse-grained locking provides lower overhead under low contention, fine-grained locking provides higher scalability under contention, though at the expense of implementation complexity and re- duced best-case performance. We revisit this trade-off in the context of microkernels and tightly-coupled cores with shared caches and low inter-core migration latencies. We evaluate performance on… 

Figures and Tables from this paper

Komodo: Using verification to disentangle secure-enclave hardware from software
Komodo illustrates an alternative approach to attested, on-demand, user-mode, concurrent isolated execution and aims to achieve security equivalent to or better than SGX while enabling deployment of new enclave features independently of CPU upgrades.


For a Microkernel, a Big Lock Is Fine
It is argued that a big lock may be fine-grained enough for a microkernel designed to run on closely-coupled cores (sharing a cache), as with the short system calls typical for a well-designed microkernel, lock contention remains low under realistic loads.
Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture
This work has developed sophisticated benchmarks that allow for in-depth investigations with full memory location and coherence state control of the Intel Has well-EP micro-architecture, including important memory latency and bandwidth characteristics as well as the cost of core-to-core transfers.
Speculative lock elision: enabling highly concurrent multithreaded execution
Speculative Lock Elision (SLE) is proposed, a novel micro-architectural technique to remove dynamically unnecessary lock-induced serialization and enable highly concurrent multithreaded execution and can provide programmers a fast path to writing correct high-performance multithreadinged programs.
TxLinux: using and managing hardware transactional memory in an operating system
TxLinux is a variant of Linux that is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and the first to manage HTM in the scheduler, and integration of transactions with the OS scheduler is discussed.
Quantifying the Capacity Limitations of Hardware Transactional Memory
This paper provides the first comprehensive empirical study of the “capacity envelope” of HTM in Intel's Haswell and IBM's Power8 architectures, providing what the authors believe is a much needed understanding of the extent to which one can use these systems to replace locks.
Improving interrupt response time in a verifiable protected microkernel
This paper explores how to reduce the worst-case interrupt latency in a (mostly) non-preemptible protected kernel, and still maintain the ability to apply formal methods for analysis.
The multikernel: a new OS architecture for scalable multicore systems
This work investigates a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.
Non-scalable locks are dangerous
Using Linux on a 48-core machine, this paper shows that non-scalable locks can cause dramatic collapse in the performance of real workloads, even for very short critical sections.
Improving the FreeBSD SMP Implementation
  • Greg Lehey
  • Computer Science
    USENIX Annual Technical Conference, FREENIX Track
  • 2001
This paper describes work done to remove this bottleneck, replacing it with fine-grained locking, derives from work done on BSD/OS and has many similarities with the approach taken in SunOS 5.
Building FIFO and Priority-Queuing Spin Locks from Atomic Swap
The main technical contributions are techniques and algorithms that provide tight control over lock grant order, use only the atomic swap instruction, use at most one spin for lock acquisition and no spinning for lock release, and need only O(L + P) space on either a coherent-cache or NUMA machine.