• Publications
  • Influence
An analysis of latent sector errors in disk drives
The reliability measures in today's disk drive-based storage systems focus predominantly on protecting against complete disk failures. Previous disk reliability studies have analyzed empirical dataExpand
  • 301
  • 36
An empirical study on configuration errors in commercial and open source systems
Configuration errors (i.e., misconfigurations) are among the dominant causes of system failures. Their importance has inspired many research efforts on detecting, diagnosing, and fixingExpand
  • 203
  • 24
An analysis of data corruption in the storage stack
An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand itsExpand
  • 268
  • 21
IRON file systems
Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporatesExpand
  • 250
  • 16
Parity Lost and Parity Regained
RAID storage systems protect data from storage errors, such as data corruption, using a set of one or more integrity techniques, such as checksums. The exact protection offered by certain techniquesExpand
  • 93
  • 11
How do fixes become bugs?
Software bugs affect system reliability. When a bug is exposed in the field, developers need to fix them. Unfortunately, the bug-fixing process can also introduce errors, which leads to buggy patchesExpand
  • 188
  • 10
Warming up storage-level caches with bonfire
Large caches in storage servers have become essential for meeting service levels required by applications. These caches need to be warmed with data often today due to various scenarios includingExpand
  • 43
  • 3
X-RAY: a non-invasive exclusive caching mechanism for RAIDs
RAID storage arrays often possess gigabytes of RAM for caching disk blocks. Currently, most RAID systems use LRU or LRU-like policies to manage these caches. Since these array caches do not recognizeExpand
  • 66
  • 3
Tolerating File-System Mistakes with EnvyFS
We introduce EnvyFS, an N-version local file system designed to improve reliability in the face of file-system bugs. EnvyFS, implemented as a thin VFS-like layer near the top of the storage stack,Expand
  • 25
  • 3
Semantically-smart disk systems: past, present, and future
In this paper we describe research that has been on-going within our group for the past four years on semantically-smart disk systems. A semantically-smart system goes beyond typical block-basedExpand
  • 40
  • 2