THE optimization of HDFS based on small files

  title={THE optimization of HDFS based on small files},
  author={Liu Jiang and Bing Li and Meina Song},
  journal={2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT)},
HDFS is a distributed file system which can process large amounts of data effectively through large clusters, the HADOOP framework which is based on it has been widely used in various clusters to build large scale, high performance systems. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. There are many companies focus on cloud storage areas today, such as Amazon's s3 which provide data hosting. With the rapid… CONTINUE READING
Highly Cited
This paper has 42 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 33 extracted citations


Publications referenced by this paper.
Showing 1-7 of 7 references

Implementing WebGIS on Hadoop: A Case Studyof Improving Small File 110

  • Xuhui Liu, Jizhong Han, Yunqin Zhong, Cheng de Han
  • Performance on HDFS,
  • 2010
2 Excerpts

Ghemawat.Mapreduce: Simplified data processing on large clusters

  • Jeffrey Dean, Sanjay
  • 2004
2 Excerpts

Similar Papers

Loading similar papers…