Corpus ID: 212507209

Improvisation of Incremental Computing in Hadoop Architecture with File Caching

@inproceedings{Alsi2015ImprovisationOI,
  title={Improvisation of Incremental Computing in Hadoop Architecture with File Caching},
  author={Mr. Alhad V. Alsi},
  year={2015}
}
Incremental data is a difficult problem, as it requires the continues development of well defined algorithms and a runtime system to support the continuous progress in computation. Many online data sets are elastic in nature. New entries get added with respect to the progress in the application. The Hadoop is a dedicated to the processing of distributed data and used to manipulate the large amount of distributed data. This manipulation not only contains storage but also the computation and… Expand

References

SHOWING 1-10 OF 29 REFERENCES
IncMR: Incremental Data Processing Based on MapReduce
TLDR
Experiments show that non-iterative algorithms running in MapReduce framework can be migrated to IncMR directly to get efficient incremental and continuous processing without any modification. Expand
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
TLDR
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems. Expand
Incoop: MapReduce for incremental computations
TLDR
This paper describes the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations that detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism. Expand
iMapReduce: A Distributed Computing Framework for Iterative Computation
TLDR
iMapReduce significantly improves the performance of iterative implementations by reducing the overhead of creating new MapReduce jobs repeatedly, eliminating the shuffling of static data, and allowing asynchronous execution of map tasks. Expand
Clydesdale: structured data processing on MapReduce
TLDR
Clydesdale, a novel system for structured data processing on Hadoop -- a popular implementation of MapReduce, is introduced and it is shown that Clydesdale provides more than an order of magnitude in performance improvements compared to existing approaches without requiring any changes to the underlying platform. Expand
Composable Incremental and Iterative Data-Parallel Computation with Naiad
TLDR
This paper evaluates a prototype of Naiad, a set of declarative data-parallel language extensions and an associated runtime supporting efficient and composable incremental and iterative computation, that uses shared memory on a single multi-core computer. Expand
Hourglass: A library for incremental processing on Hadoop
Hadoop enables processing of large data sets through its relatively easy-to-use semantics. However, jobs are often written inefficiently for tasks that could be computed incrementally due to theExpand
Parallel and distributed methods for incremental frequent itemset mining
TLDR
This paper presents an efficient algorithm which dynamically maintains the required information even in the presence of data updates without examining the entire dataset, and proposes a distributed asynchronous algorithm, which imposes minimal communication overhead for mining distributed dynamic datasets. Expand
ERMS: An Elastic Replication Management System for HDFS
TLDR
ERMS provides an active/standby storage model for HDFS that utilizes a complex event processing engine to distinguish real-time data types, and then dynamically increases extra replicas for hot data, cleans up these extra Replicas when the data cool down, and uses erasure codes for cold data. Expand
Hadoop Architecture and Its Issues
  • Anam Alam, Jamil Ahmed
  • Computer Science
  • 2014 International Conference on Computational Science and Computational Intelligence
  • 2014
TLDR
The shortcomings in Hadoop are described to show how the distributed paradigm used to manipulate the large amount of data to perform the operations like data analysis, result analysis, data analytics etc. Expand
...
1
2
3
...