Big Data Analytics with Apache Hadoop MapReduce Framework

@article{Greeshma2016BigDA,
  title={Big Data Analytics with Apache Hadoop MapReduce Framework},
  author={L. Greeshma and G. Pradeepini},
  journal={Indian journal of science and technology},
  year={2016},
  volume={9}
}
Huge amount of data cannot be handled by conventional database management system. For storing, processing and accessing massive volume of data, which is possible with help of Big data. In this paper we discussed the Hadoop Distributed File System and MapReduce architecture for storing and retrieving information from massive volume of datasets. In this paper we proposed a WordCount application of MapReduce object oriented programming paradigm. It divides input file into splits or tokens that is… 

Figures and Tables from this paper

Robust and Resilient Migration of Data Processing Systems to Public Hadoop Grid
  • D. Vasthimal
  • Computer Science
    2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion)
  • 2018
TLDR
The motivation, challenges faced solutions employed and best practices are discussed, with more focus on efforts taken to ease the migration of systems with least possible impact on customer.
Development of real-time big data analysis system for RHIPE-based marketing in the automobile industry
TLDR
A real-time big data analysis system that can analyze the orders, reservations, and maintenance history contained in big data using the RHIPE method is developed.
Harnessing supremacy of big data using hadoop for healthy human survival making use of bioinformatics
TLDR
The analysis of big data genre performed to achieve critical objectives for revolutionizing healthcare and to mine out the bioinformatics facet of a particular age group affected by a particular disease describes that the scripts and queries provide sorted attributes from the database created and these attributes provide norms which justifies the objectives stated.
Earlier stage for straggler detection and handling using combined CPU test and LATE methodology
TLDR
This research proposes a hybrid MapReduce framework referred to as the combinatory late-machine (CLM) framework that will facilitate early and timely detection and identification of stragglers thereby facilitating prompt appropriate and effective actions.
Diagnosing Diabetic Dataset using Hadoop and K-means Clustering Techniques.
TLDR
This paper focuses on how grouping calculation to be specific K-means can be utilized as a part of parallel handling stage in particular Apache Hadoop bunch (MapReduce paradigm huge) so as to dissect the gigantic information quicker.
Design and Application of a Containerized Hybrid Transaction Processing and Data Analysis Framework
TLDR
The authors design a hybrid development framework, to offer greater scalability and flexibility of data analysis and reporting, while keeping maximum compatibility and links to the legacy platforms on which transaction business logics run.
Scalable Data Reporting Platform for A/B Tests
  • D. Vasthimal, Pavan Kumar Srirama, Arun Kumar Akkinapalli
  • Computer Science
    2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS)
  • 2019
A/B Test data platform enables users, such as a Product Manager or Product Owner, to answer important questions such as "How will this new product feature benefit ?" or "Does this new page layout
Implementation of Digital Signature Algorithm using Big Data Sensing Environment
TLDR
A Big data retrieval unit in WBAN is proposed using Elliptical Curve Cryptography to transmit the data through Map reduce and retrieve the data safely using ECCDS algorithm.
Near Real-Time Tracking at Scale
TLDR
This work describes the process of creating a highly available data pipeline and computational model for user sessions at scale, which is critical for business analytics as they represent true user behavior.
Computerized grading of brain tumors supplemented by artificial intelligence
TLDR
The identification and classification of tumors from the MRI results are combined and the said approach is believed to deliver promising results in terms of accuracy, which has also been verified experimentally.
...
...

References

SHOWING 1-10 OF 13 REFERENCES
INPUT SPLIT FREQUENT PATTERN TREE USING MAPREDUCE PARADIGM IN HADOOP
  • Computer Science
  • 2016
TLDR
This paper proposed Association Rule Mining based on Hadoop Distributed File System for storing huge amount of data and implemented using MapReduce object oriented programming paradigm for processing of a data.
Hadoop based Feature Selection and Decision Making Models on Big Data
TLDR
A compared proposed model to different big data feature selection and classification models along with advantages and limitations is given, which require a prior knowledge of classification accuracy for various types of data characteristics, which is impossible to obtain in practice.
The Hadoop Distributed File System
TLDR
The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.
Authentication Service in Hadoop using One Time Pad
TLDR
A novel and a simple authentication model using one time pad algorithm that removes the communication of passwords between the servers is proposed to enhance the security in Hadoop environment.
Scheduling Job Queue on Hadoop using Hybrid Hadoop Fair Sojourn Protocol
TLDR
The Hybrid Hadoop Fair Sojourn Protocol (hybrid HFSP) is introduced to pause jobs with higher SRPT and allow other waiting jobs in queue based on First Come First Serve (FCFS).
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
TLDR
A novel frequent-pattern tree (FP-tree) structure is proposed, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and an efficient FP-tree-based mining method, FP-growth, is developed for mining the complete set of frequent patterns by pattern fragment growth.
Big Data Implementation of Natural Disaster Monitoring and Alerting System in Real Time Social Network using Hadoop Technology
TLDR
The method finding the two data set representations: one is considering the two directional social relations, and the other considering the one directional social relation can be greatly boosted by the mentioned contextual factors.
Mining Maximal Efficient Closed Itemsets Without Any Redundancy
TLDR
A framework called Analyzing All Maximal Efficient Itemsets to provide a condensed and lossless representation of data in form of rule association rules and the proposed method AAEMIs regains complete relevant itemsets from a group of efficient Maximal Closed Itemsets (MCIs) without specifying user specified constraint and overcoming redundancy.
...
...