A file is not a file: understanding the I/O behavior of Apple desktop applications

@article{Harter2011AFI,
  title={A file is not a file: understanding the I/O behavior of Apple desktop applications},
  author={Tyler Harter and Chris Dragga and Michael Vaughn and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau},
  journal={Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles},
  year={2011}
}
We analyze the I/O behavior of iBench, a new collection of productivity and multimedia application workloads. Our analysis reveals a number of differences between iBench and typical file-system workload studies, including the complex organization of modern files, the lack of pure sequential access, the influence of underlying frameworks on I/O patterns, the widespread use of file synchronization and atomic operations, and the prevalence of threads. Our results have strong ramifications for the… 
The Composite-file File System: Decoupling the One-to-One Mapping of Files and Metadata for Better Performance
TLDR
A composite-file file system is designed, implemented, and evaluated, which allows many-to-one mappings of files to metadata, and the design space of different mapping strategies is explored.
TABLEFS: Embedding a NoSQL database inside the local file system
TLDR
This paper examines using techniques adopted from NoSQL databases to manage file system metadata and small files to improve the performance of modern local file systems in Linux for workloads dominated by metadata and tiny files.
Caching or Not: Rethinking Virtual File System for Non-Volatile Main Memory
TLDR
ByVFS is presented, an optimization of VFS to directly access metadata in PM file systems bypassing VFS caching layer, and the results show ByVFS outperforms conventional VFS with cold cache and provides comparable performance against conventional V FS with warm cache.
Turn Your Storage Stack into a File System
TLDR
It is argued that a multi-layer filesystem will be simpler to implement and to use than the complex collection of different storage systems that the authors have now, because many storage system optimizations both at the OS and application layers are designed to hide access latencies.
Strata: A Cross Media File System
TLDR
Strata is presented, a cross-media file system that leverages the strengths of one storage media to compensate for weaknesses of another, and has 20-30% better latency and throughput, compared to file systems purpose-built for each layer, while providing synchronous and unified access to the entire storage hierarchy.
Building a Reliable Storage Stack
TLDR
Loris, the redesign of the storage stack along three dimensions: reliability, heterogeneity and flexibility, is presented and several major problems with the traditional stack are highlighted.
Analysis of HDFS under HBase: a facebook messages case study
TLDR
It is examined how layering causes write amplication when HBase is run on top of HDFS and how tighter integration could result in improved write performance, and whether it makes sense to include an SSD to improve performance while keeping costs in check.
Understanding Data Characteristics and Access Patterns in a Cloud Storage System
TLDR
An analysis of file system snapshot and five-month access trace of a campus cloud storage system that has been deployed on Tsinghua campus for three years finds that the cache efficiency can be improved by 5 times using the guidance from the observations.
Arrakis: The Operating System is the Control Plane
TLDR
A new operating system, Arrakis, is designed and implemented that splits the traditional role of the kernel in two, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation.
Extending the lifetime of flash-based storage through reducing write amplification from file systems
TLDR
An object-based flash translation layer design (OFTL), in which mechanisms are co-designed with flash memory, which enables lazy persistence of index metadata and eliminates journals while keeping consistency and coarse-grained block state maintenance reduces persistent free space management overhead.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Analysis of file I/O traces in commercial computing environments
TLDR
This paper analyzes file I/O traces of several existing production computer sytems to understand file access behavior and observes that although only a third of the active files are sequentially shared, they receive a very large proportion of the total operations.
A Comparison of File System Workloads
TLDR
This paper describes the collection and analysis of file system traces from a variety of different environments, including both UNIX and NT systems, clients and servers, and instructional and production systems and develops a new metric for measuring file lifetime that accounts for files that are never deleted.
A trace-driven analysis of the unix 4
TLDR
The UNIX 4.2BSD file system is analyzed by recording activity in trace files and writing programs to analyze the traces, and a simulator that uses the traces to predict the performance of caches for disk blocks is written.
File system usage in Windows NT 4.0
TLDR
This paper reports on the usage details of the Windows NT file system architecture, through a detailed comparison with the older traces, through details on the operational characteristics and through a usage analysis of the file system and cache manager.
A study of file sizes and functional lifetimes
TLDR
The collection, analysis and interpretation of data pertaining to files in the computing environment of the Computer Science Department at Carnegie-Mellon University (CMU-CSD) is discussed.
Measurements of a distributed file system
TLDR
This work analyzed the user-level file access patterns and caching behavior of the Sprite distributed file system and found that client cache consistency is needed to prevent stale data errors, but that it is not invoked often enough to degrade overall system performance.
Scale and performance in a distributed file system
TLDR
Observations of a prototype implementation are presented, changes in the areas of cache validation, server process structure, name translation, and low-level storage representation are motivated, and Andrews ability to scale gracefully is quantitatively demonstrated.
Analysis and Evolution of Journaling File Systems
We develop and apply two new methods for analyzing file system behavior and evaluating file system changes. First, semantic block-level analysis (SBA) combines knowledge of on-disk data structures
The Google file system
TLDR
This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
A large-scale study of file-system contents
TLDR
It is found that file and directory sizes are fairly consistent across file systems, but file lifetimes vary widely and are significantly affected by the job function of the user.
...
1
2
3
4
...