• Corpus ID: 16703930

Big Data Processing using Apache Hadoop in Cloud System

@inproceedings{Pingle2012BigDP,
  title={Big Data Processing using Apache Hadoop in Cloud System},
  author={Mr. Yogesh Pingle and Vaibhav Kohli and Shruti Kamat and Nimesh Poladia},
  year={2012}
}
The ever growing technology has resulted in the need for storing and processing excessively large amounts of data on cloud. The current volume of data is enormous and is expected to replicate over 650 times by the year 2014, out of which, 85% would be unstructured. This is known as the ‘Big Data’ problem. The techniques of Hadoop, an efficient resource scheduling method and a probabilistic redundant scheduling, are presented for the system to efficiently organize "free" computer storage… 
Efficient Utilization of Profiles to Reduce Time in Very Large Data Set
TLDR
An INTERFACE is proposed that optimizes time taken to match sampled mapreduce jobs (Js) with already created profiles and acts as mediator between profile store and worker (nodes).
Data Analysis using Mapper and Reducer with Optimal Configuration in Hadoop
TLDR
The proposed Mapper Reducer function allows us to analyze the data set and achieve better performance in executing the job by using optimal configuration of mappers and reducers based on the size of the data sets and also helps the users to view the status of the job and to find the error localization of scheduled jobs.
Analyzing web application log files to find hit count through the utilization of Hadoop MapReduce in cloud computing environment
TLDR
This Hadoop MapReduce programming model is applied for analyzing web log files so that the authors could get hit count of specific web application and results are evaluated using Map and Reduce function.
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
TLDR
This paper proposes a log analysis system using Hadoop MapReduce which will provide accurate results in minimum response time and reduces the response time as well as the load on to the end system.
Trend Analysis of E-Commerce Data using Hadoop Ecosystem
TLDR
This paper addresses an experimental work on Trend analysis problem of big data and its optimal solution using Hadoop ecosystem, using parallel processing framework to process large data sets using Map Reduce programming and Apache Hive is a data warehouse infrastructure which is built on top ofHadoop for providing data summarization, querying and analysis.
Performance Tuning and Scheduling of Large Data Set Analysis in Map Reduce Paradigm by Optimal Configuration using Hadoop
TLDR
The proposed Mapper Reducer function using the mean shift clustering based algorithm allows for better performance in executing the job by using optimal configuration of mappers and reducers based on the size of the data sets and also helps the users to view the status of the job and to find the error localization of scheduled jobs.
Metecloud: A Private Cloud Platform For Meteorological Data Storage Using Hadoop
TLDR
This paper proposes an idea to build the MeteCloud platform for meteorological departments using Hadoop, and proves to be efficient and suitable for the storage of meteorological data.
A Novel Storage Architecture for Facilitating Efficient Analytics of Health Informatics Big Data in Cloud
TLDR
A new big data storage architecture consisting of application cluster and a storage cluster to facilitate read/write/update speedup as well as data optimization is proposed.
Addressing Name Node Scalability Issue in Hadoop Distributed File System Using Cache Approach
TLDR
The concept of cache memory is used for handling the issue of Name Node scalability and the approach that tries to enhance the current architecture and ensure that Name Node does not reach its threshold value soon is highlighted.
...
...

References

SHOWING 1-10 OF 19 REFERENCES
Toward a cost-effective cloud storage service
TLDR
This paper presents a cost-effective cloud storage service model, which is built with old PCs but shows good performance, and its experimental evaluation shows 46% better performance in Postmark benchmark.
GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications
  • Huan Liu, D. Orban
  • Computer Science
    2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
  • 2008
TLDR
The GridBatch system is proposed, which aims at solving large-scale data-intensive batch problems under the Cloud infrastructure constraints and gives the users complete control on how data are partitioned and how computation is distributed so that applications can have the highest performance possible.
Autonomic Cloud Storage: Challenges at Stake
  • G. Antoniu
  • Computer Science
    2010 International Conference on Complex, Intelligent and Software Intensive Systems
  • 2010
TLDR
This talk will discuss how open issues raised by the need for efficient, secure and reliable storage service for data intensive distributed applications running in cloud environments may be addressed by enabling an autonomic behavior for the cloud storage infrastructure.
Secure, Dependable, and High Performance Cloud Storage
TLDR
This paper analyzes the requirements of access protocols for storage systems based on data partitioning schemes in widely distributed cloud environments and develops an access protocol that considers the regular semantics instead of atomic semantics to improve access efficiency.
Compute and storage clouds using wide area high performance networks
Automated control for SLA-aware elastic clouds
TLDR
The SLAaaS model (SLA aware service) is introduced that enriches the general paradigm of Cloud Computing and enables a systematic and transparent integration of service levels and SLA to the cloud.
Parallel PSO using MapReduce
TLDR
This work describes MapReduce and shows how PSO can be naturally expressed in this model, without explicitly addressing any of the details of parallelization, and demonstrates that MRPSO scales to 256 processors on moderately difficult problems and tolerates node failures.
The Definitive Guide
  • 1st ed.USA:O'Reilly Media
  • 2009
Autonomic Cloud Storage: Challenges at Stak, International Conference on Complex, Intelligent and Software Intensive Systems,2010:481-481
  • 2010
...
...