Improving the reliability of commodity operating systems

@article{Swift2005ImprovingTR,
  title={Improving the reliability of commodity operating systems},
  author={Michael M. Swift and Brian N. Bershad and Henry M. Levy},
  journal={ACM Trans. Comput. Syst.},
  year={2005},
  volume={23},
  pages={77-110}
}
Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85&percent; of recently reported failures.This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new… 

Figures and Tables from this paper

Making operating systems resilient to hardware
TLDR
This thesis proposal seeks to improve the state of modern device drivers against the backdrop of reliability through three techniques that improve the tolerance of modern drivers against unreliable hardware, make drivers more robust by automatically patching driver code and propose some guidelines on how drivers can improve by organizing themselves better.
Performance Optimizations for Isolated Driver Domains
TLDR
The key idea is to replace the interrupt-based notification between domains with a spinning-based approach, thus trading CPU capacity for increased throughput, and the results show that the solution matches or outperforms Xen's isolated driver domain in most scenarios the authors considered.
A Lightweight Method for Building Reliable Operating Systems Despite Unreliable Device Drivers Technical Report IRCS-018 , January 2006
TLDR
A lightweight approach to protecting drivers using kernel wrapping and virtual machines, guided by simplicity, modularity, least authorization, and fault tolerance is discussed and reports on its performance and reliability.
Improving Device Driver Reliability through Decoupled Dynamic Binary Analyses
TLDR
This thesis is that checking a driver’s execution for correctness violations results in the detection and mitigation of more faults, and Guardrail, a flexible and powerful framework that enables instruction-grained dynamic analysis of unmodified kernel-mode driver binaries to safeguard I/O operations and devices from driver faults, is presented.
Recovering device drivers
TLDR
A new mechanism is presented that enables applications to run correctly when device drivers fail and assumes the role of the failed driver during recovery, and imposes minimal performance overhead.
Can we make operating systems reliable and secure?
TLDR
Singularity, the most radical approach, uses a type-safe language, a single address space, and formal contracts to carefully limit what each module can do in the microkernel.
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers
TLDR
Guardrail is proposed and evaluated, which is a more powerful framework for run-time driver analysis that performs decoupled instruction-grain dynamic correctness checking on arbitrary kernel-mode drivers as they execute, thereby enabling the system to detect and mitigate more challenging correctness bugs that cannot be detected by today's fault isolation techniques.
Design of fault tolerant system based on runtime behavior tracing
  • Sumin Park, Kwangyong Lee
  • Computer Science
    2010 The 12th International Conference on Advanced Communication Technology (ICACT)
  • 2010
TLDR
A system that traces system behavior at runtime and recovers optimally when errors are detected and experimented in Linux 2.6.24 kernel operating on GP2X-WIZ mobile game player.
On the construction of reliable device drivers
TLDR
This dissertation presents an approach to reducing the number of defects through an improved device driver architecture and development process by synthesising the implementation of a driver from the combination of three formal specifications: a device-class specification that describes common properties of a class of similar devices, a device specifications that describes a concrete representative of the class, and an operating system interface specification that describe the communication protocol between the driver and the operating system.
VirtuOS: an operating system with kernel virtualization
TLDR
A prototype based on the Linux kernel and Xen hypervisor can survive the failure of individual service domains while outperforming alternative approaches such as isolated driver domains and even exceeding the performance of native Linux for some multithreaded workloads.
...
...

References

SHOWING 1-10 OF 264 REFERENCES
Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines
TLDR
By allowing distinct device drivers to reside in separate virtual machines, this technique isolates faults caused by defective or malicious drivers, thus improving a system's dependability, and enables extensive reuse of existing and unmodified drivers.
Dealing with disaster: surviving misbehaved kernel extensions
TLDR
This paper explains how VINO uses software fault isolation as its safety mechanism and a lightweight transaction system to cope with resource-hoarding and finds that while the overhead of these techniques is high relative to the cost of the extensions themselves, it is lowrelative to the benefits that extensibility brings.
THE FLUKE DEVICE DRIVER FRAMEWORK
TLDR
A framework whose design is based on running device drivers as usermode servers is presented, which resolves the fundamental execution environment mismatch and proposes guidelines for improving device drivers’ portability across different execution environments.
An empirical study of operating systems errors
TLDR
A study of operating system errors found by automatic, static, compiler analysis applied to the Linux and OpenBSD kernels found that device drivers have error rates up to three to seven times higher than the rest of the kernel.
The Flux OSKit: a substrate for kernel and language research
TLDR
The OSKit demonstrates a technique that allows unmodified code from existing mature operating systems to be incorporated quickly and updated regularly, by wrapping it with a small amount of carefully designed "glue" code to isolate it s dependencies and export well-defined interfaces.
Exokernel: an operating system architecture for application-level resource management
TLDR
The prototype exokernel system implemented here is at least five times faster on operations such as exception dispatching and interprocess communication, and allows applications to control machine resources in ways not possible in traditional operating systems.
Xen and the art of virtualization
TLDR
Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality, considerably outperform competing commercial and freely available solutions.
The case for run-time replaceable kernel modules
  • R. Draves
  • Computer Science
    Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III
  • 1993
TLDR
It is argued that an operating system kernel that allows the run-time replacement of modules is an appropriate solution, especially for consumer-oriented environments, because it allows applications to solve feature-deficiency, performance, and version-skew problems.
The systematic improvement of fault tolerance in the Rio file cache
  • Wee Teck Ng, Peter M. Chen
  • Computer Science
    Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352)
  • 1999
TLDR
A systematic and quantitative approach for using software-implemented fault injection to guide the design and implementation of a fault-tolerant system to improve robustness in the presence of operating system errors is presented.
Linux Device Drivers, 3rd Edition
TLDR
This bestselling guide provides all the information you'll need to write drivers for a wide range of devices and covers all the significant changes to Version 2.6 of the Linux kernel, which simplifies many activities, and contains subtle new features that can make a driver both more efficient and more flexible.
...
...