epiC: an Extensible and Scalable System for Processing Big Data

@article{Jiang2014epiCAE,
  title={epiC: an Extensible and Scalable System for Processing Big Data},
  author={Dawei Jiang and Gang Chen and Beng Chin Ooi and Kian-Lee Tan and Sai Wu},
  journal={Proc. VLDB Endow.},
  year={2014},
  volume={7},
  pages={541-552}
}
The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming… 
Husky: Towards a More Efficient and Expressive Distributed Computing Framework
TLDR
Husky is developed mainly for in-memory large scale data mining, and also serves as a general research platform for designing efficient distributed algorithms and shows that many existing frameworks can be easily implemented and bridged together inside Husky, and Husky is able to achieve similar or even better performance compared with domain-specific systems.
MapReduce for Big Data Analysis: Benefits, Limitations and Extensions
TLDR
The benefits and limitations of MapReduce programming paradigm are discussed and also its extensions to make Map Reduce go beyond the limitations are discussed.
Big Data in Massive Parallel Processing: A Multi-Core Processors Perspective
With the advent of novel wireless technologies and Cloud Computing, large volumes of data are being produced from various heterogeneous devices such as mobile phones, credit cards, and computers. M
Role of Big Data in Internet of Things Networks
TLDR
The authors discuss the role of big data and related challenges in IoT networks and various data analytics platforms, used for the IoT domain, and present and discuss the architectural model ofbig data in IoT along with various future research challenges.
Role of Big Data in Internet of Things Networks
TLDR
The authors discuss the role of big data and related challenges in IoT networks and various data analytics platforms, used for the IoT domain, and present and discuss the architectural model ofbig data in IoT along with various future research challenges.
Power and Performance Evaluation of Memory-Intensive Applications
TLDR
Evaluating the power consumption and performance of memory-intensive applications in different generations of real rack servers finds that workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators.
The Research on Core Development Technology of Security Big Data Application Platform
TLDR
This paper uses Oracle data warehouse technology, combined with big data technology to construct all kinds of business data into large data layer after extraction, conversion, and filtering, to study and analyze the core development technology of security big data application platform.
Power Characterization of Memory Intensive Applications: Analysis and Implications
TLDR
This paper investigates the power characteristics of memory intensive applications on real rack servers of different generations and finds that hardware configuration, workload types, as well as concurrently running threads have significant impact on a server’s energy efficiency when running memoryintensive applications.
...
1
2
3
...

References

SHOWING 1-10 OF 35 REFERENCES
HaLoop: Efficient Iterative Data Processing on Large Clusters
TLDR
HaLoop is presented, a modified version of the Hadoop MapReduce framework that is designed to serve iterative applications and dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms.
Oracle in-database hadoop: when mapreduce meets RDBMS
TLDR
A prototype of Oracle In-Database Hadoop that supports the running of native Hadoops applications written in Java, and it is demonstrated how MapReduce functionalities are seamlessly integrated within SQL queries.
Distributed data-parallel computing using a high-level programming language
TLDR
The programming model is described, a high-level overview of the design and implementation of the Dryad and DryadLINQ systems are provided, and the tradeoffs and connections to parallel and distributed databases are discussed.
MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters
TLDR
This paper introduces Map-Join-Reduce, a system that extends and improves MapReduce runtime framework to efficiently process complex data analysis tasks on large clusters and presents a new data processing strategy which performs filtering-join-aggregation tasks in two successive Map Reduce jobs.
A comparison of approaches to large-scale data analysis
TLDR
A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
TLDR
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
S4: Distributed Stream Computing Platform
TLDR
The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers.
The performance of MapReduce
TLDR
By carefully tuning these factors, the overall performance of Hadoop can be improved by a factor of 2.5 to 3.5, and is thus more comparable to that of parallel database systems.
Map-reduce-merge: simplified relational data processing on large clusters
TLDR
A Merge phase is added to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted by map and reduce modules, and it is demonstrated that this new model can express relational algebra operators as well as implement several join algorithms.
Dryad: distributed data-parallel programs from sequential building blocks
TLDR
The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
...
1
2
3
4
...