Corpus ID: 18451205

Architecture for Hadoop Distributed File Systems

  title={Architecture for Hadoop Distributed File Systems},
  author={S. Usha Devi and Kavin Kamaraj},
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. In this paper focused on the backend architecture and working of the parts of the… Expand
Hadoop high availability through multiple active name nodes
This paper proposed a solution to reduce the load on the primary name node by transferring the metadata to remaining standby name nodes, which compress the entire metadata in the primaryName node and sent that data into remaining all standby name node. Expand
Trusted Heartbeat Framework for Cloud Computing
This work creates collaborative network between worker node and Master node with the help of trusted heartbeat framework (THF) and proposes procedures to register node and to alter status of node based on reputation provided by other co-worker nodes. Expand


The Hadoop Distributed File System
The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on. Expand
Scalable Performance of the Panasas Parallel File System
Performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes are presented. Expand
HDFS Scalability: The Limits to Growth
An analysis of how the amount of RAM of a single namespace server correlates with the storage capacity of Hadoop clusters is provided, the advantages of the single-node namespace server architecture for linear performance scaling are outlined, and practical limits of growth are established. Expand
Ceph: a scalable, high-performance distributed file system
Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second. Expand
Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)
This Technical Report is brought to you for free and open access by the Research Centers and Institutes at Research Showcase. It has been accepted forinclusion in Parallel Data Laboratory by anExpand
The limits to growth,‖ ;login