An Efficient Approach for Storing and Accessing Small Files with Big Data Technology

@article{Gupta2016AnEA,
  title={An Efficient Approach for Storing and Accessing Small Files with Big Data Technology},
  author={Bharti Gupta and R. Nath and G. Gopal and Kartik},
  journal={International Journal of Computer Applications},
  year={2016},
  volume={146},
  pages={36-39}
}
Hadoop is an open source Apache project and a software framework for distributed processing of large datasets across large clusters of computers with commodity hardware. Large datasets include terabytes or petabytes of data where as large clusters means hundreds or thousands of nodes. It supports master slave architecture, which involves one master node and thousands of slave nodes. NameNode acts as the master node which stores all the metadata of files and various data nodes are slave nodes… Expand
A Review of Various Optimization Schemes of Small Files Storage on Hadoop
TLDR
The basic architecture of Hadoop system is introduced, and problems, which are generated when Hadoops handles a large number of small files, are analyzed and summarized, and the necessity of optimization scheme of small file storage based on Hadooper is showed. Expand
Performance Analysis of Small Files in HDFS using Clustering Small Files based on Centroid Algorithm
  • R. Rathidevi, R. Parameswari
  • Computer Science
  • 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
  • 2020
TLDR
Clustering Small Files based on Centroid (CSFC) approach is used to place the related files in a cluster to process large-sized file Hadoop. Expand
Available techniques in hadoop small file issue
TLDR
One of hadoop’s limitations, called “big data in small files” accrued when a massive number of small files pushed into a hadoop cluster which will rides the cluster to shut down totally is high lighted. Expand
A Review on Small Files in HADOOP
  • 2017
Hadoop is an open source data management system designed for storing and processing large volumes of data, minimum size being 64MB. Storing and processing of Small Files smaller than the minimumExpand
SFSAN Approach for Solving the Problem of Small Files in Hadoop
TLDR
This paper proposed an enhancement of the Sequence files approach called Small Files Search and Aggregation Node (SFSAN) approach, which improves the Hadoop performance by overcoming some of the limitations of the sequence files approach and keeping up its advantages. Expand
CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop
In day to day life, the computer plays a major role, due to this advancement of technology collection of data from various fields are increasing. A large amount of data is produced by various fieldsExpand
Small Files Consolidation Technique in Hadoop Cluster
TLDR
The proposed Small File Consolidation (SFC) is to overcome some of the current challenges with respect to performance of Hadoop Cluster and will improve the query execution time by generating the result set quickly which will result the effective management of cluster usage. Expand
An Approach for Effectively Handling Small-Size Image Files in Hadoop
TLDR
The approach used in this paper is shown to be efficient than the solution provided by HIPI (Hadoop Image processing Interface), and form a perfect application domain for evaluating solutions for small size file handling problem in Hadoop. Expand
Application of Computer Big Data in Internet Learning
TLDR
It is believed that Internet learning is the major trend in the future teaching reform and development, and more attention should be paid to the students’ experience and the optimization and upgrading of related technologies in the process of reform andDevelopment. Expand
Resolving data interoperability in ubiquitous health profile using semi-structured storage and processing
TLDR
The Ubiquitous Health Profile (UHPr), which enables a semantic solution to the data interoperability problem, in the domain of healthcare1. Expand
...
1
2
...

References

SHOWING 1-10 OF 19 REFERENCES
THE optimization of HDFS based on small files
  • Liu Jiang, B. Li, Meina Song
  • Computer Science
  • 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT)
  • 2010
TLDR
This article optimize the HDFS I/O feature based on small files, the basic idea is let one block save many small files and let the datanode save some meta-data of small files in it's memory. Expand
An optimized approach for storing and accessing small files on cloud storage
TLDR
Experimental results demonstrate that the proposed schemes effectively improve the storage and access efficiencies of small files, compared with native HDFS and a Hadoop file archiving facility. Expand
A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files
TLDR
The experimental results indicate that the proposed approach is able to effectively mitigate the load of NameNode and to improve the efficiency of storing and accessing massive small files on HDFS. Expand
Improving metadata management for small files in HDFS
TLDR
This work proposes a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata, and provides for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the JobTracker due to quota policies. Expand
Hadoop: The Definitive Guide
TLDR
This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters. Expand
Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications
TLDR
This paper compares the Fat-Btree based data access method, which excludes center node in clusters, with Hadoop, and shows their different performance in different file I/O applications. Expand
The Hadoop Distributed File System
TLDR
The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on. Expand
Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS
TLDR
This paper proposes an approach to optimize I/O performance of small files on HDFS by combining small files into large ones to reduce the file number and build index for each file. Expand
Performance analysis of Hadoop for handling small files in single node
TLDR
Through experiments with some typical file sets in a single node, Hadoop’s performance on small files under different FileInputFormat is compared and the performance differences are explained by Hadoops own execution principle. Expand
A digital library architecture supporting massive small files and efficient replica maintenance
TLDR
A service infrastructure based on distributed file system for massive storage in digital library is presented, and a novel dynamic replica number adjustment scheme is proposed to ensure the maximal availability and reliability in a limited storage space. Expand
...
1
2
...