Highly Available Hadoop NameNode Architecture

@article{Khan2012HighlyAH,
  title={Highly Available Hadoop NameNode Architecture},
  author={M. A. Khan and Zulfiqar Ali Memon and S. Khan},
  journal={2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT)},
  year={2012},
  pages={167-172}
}
  • M. A. KhanZ. MemonS. Khan
  • Published 26 November 2012
  • Computer Science
  • 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT)
In past few years Hadoop Distributed File System (HDFS) has been used by many organizations with gigantic data sets and streams of operations on it. HDFS provides distinct features like, high fault tolerance, scalability, etc. The Name Node machine is a single point of failure (SPOF) for a HDFS cluster. If the Name Node machine fails, the system needs to be re-started manually, making the system less available. This paper proposes a highly available architecture and its working principle for… 

Figures from this paper

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas

This paper proposes a highly available architecture and its working principle for the HDFS NameNode against its SPOF utilizing well-known 2-Phase Commit (2PC) Protocol and election by bully with Time synchronization mechanism.

The Data Recovery File System for Hadoop Cluster-Review Paper

A scheme replicates the Namenode on the other Datanode so that the availability of the metadata is increases and also Decreases the loss and delay of data.

The Recovery System for Hadoop Cluster

A scenario replicates the Namenode on the other Datanode so that the availability of the metadata is increases which will reduce the loss of data as well as delay.

Enhancing NameNode fault tolerance in Hadoop over cloud environment

Hadoop achieves fault tolerance by using an Observer Tool, which will continuously monitor the NN and proactively calculates chances of any crash and in such a case system will run using the secondary NN.

Improved Time Complexity and Load Balance for DFS in Multiple NameNode

This paper implements a system for load balancing, NameNode bottleneck problem solution, and time requirements are reduced average in read and write.

A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS

The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations, which dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service.

Hadoop high availability through multiple active name nodes

This paper proposed a solution to reduce the load on the primary name node by transferring the metadata to remaining standby name nodes, which compress the entire metadata in the primaryName node and sent that data into remaining all standby name node.

Elastic HDFS: interconnected distributed architecture for availability–scalability enhancement of large-scale cloud storages

An interconnected distributed architecture for storing data and metadata in large-scale cloud storage systems and a coordination protocol is designed for communication among file servers, and maintaining user transparency in the presence of different file system actions/reactions is presented.

Daemons of Hadoop: An Overview

The daemons of key components of Hadoop: HDFS & MapReduce and how both these components are used to store the Big Data and also to process the Big data respectively are discussed.

References

SHOWING 1-10 OF 16 REFERENCES

The Hadoop Distributed File System

The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

Hadoop high availability through metadata replication

A metadata replication based solution to enable Hadoop high availability by removing single point of failure in Hadoan, which presents several unique features for Hadoops, such as runtime configurable synchronization mode.

AVAILABILITY OF JOB TRACKER MACHINE IN HADOOP /M AP REDUCE ZOOKEEPER COORDINATED CLUSTERS

A mathematical model for the availability of the JobTracker in Hadoop/MapReduce using Zookeeper's Leader Election Service is presented, which makes coordination and synchronization easy, reduces the effect of Byzantine faults and provides Fault Tolerance for distributed systems.

Hadoop: The Definitive Guide

This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

MapReduce: simplified data processing on large clusters

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

The Hadoop Distributed File System:Architecture and Design

  • Apache Software foundation, http:/ hadoop.apache.org/common/docs/r0.18.0/hdfs _design.pdf .

Hadoop: The Definitive Guide " . O'Reilly Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North

  • Hadoop: The Definitive Guide " . O'Reilly Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North
  • 2009

The Hadoop Distributed File System:Architecture and Design, " in Apache Software foundation

  • The Hadoop Distributed File System:Architecture and Design, " in Apache Software foundation

The Google file system

Distributed Systems: Principle and Paradigms

  • Distributed Systems: Principle and Paradigms