The Hadoop distributed filesystem: Balancing portability and performance

@article{Shafer2010TheHD,
  title={The Hadoop distributed filesystem: Balancing portability and performance},
  author={Jeffrey Shafer and Scott Rixner and Alan L. Cox},
  journal={2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)},
  year={2010},
  pages={122-133}
}
Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem - HDFS - is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefficient… CONTINUE READING
Highly Influential
This paper has highly influenced 24 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 274 citations. REVIEW CITATIONS

11 Figures & Tables

Topics

Statistics

0204060201020112012201320142015201620172018
Citations per Year

274 Citations

Semantic Scholar estimates that this publication has 274 citations based on the available data.

See our FAQ for additional information.