A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures
@article{CanoLores2016ASO, title={A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures}, author={Silvina Ca{\'i}no-Lores and Jes{\'u}s Carretero}, journal={World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering}, year={2016}, volume={10}, pages={517-523} }
Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that…
Figures from this paper
6 Citations
On the effects of allocation strategies for exascale computing systems with distributed storage and unified interconnects
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2019
This paper investigates alternatives for the storage subsystem of a novel exascale‐capable system with special emphasis on how allocation strategies would affect the overall performance, and suggests that scheduling policies exposing data‐locality information can be essential for the appropriate utilization of future large‐scale systems.
On the Effects of Data-Aware Allocation on Fully Distributed Storage Systems for Exascale
- Computer ScienceEuro-Par Workshops
- 2017
The need of enhancing system schedulers to differentiate between compute- and data-oriented applications to minimise interferences between storage and application traffic is shown.
Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System Architecture
- Computer ScienceElectronics
- 2022
An extensive review of cutting-edge research on data locality in HPC, big data, and converged environments is provided and a system architecture for future HPC and big data converged systems is proposed.
JHTD: An Efficient Joint Scheduling Framework Based on Hypergraph for Task Placement and Data Transfer Across Geographically Distributed Data Centers
- Computer ScienceIEEE Access
- 2022
This work proposes an efficient joint scheduling framework based on hypergraph for task placement and data transfer across geographically distributed data centers and demonstrates that the results have demonstrated that <inline-formula> <tex-math notation="LaTeX">$JHTD$ </tex- Math> can reduce the makespan by up to 20.6%.
Hypergraph+: An Improved Hypergraph-Based Task-Scheduling Algorithm for Massive Spatial Data Processing on Master-Slave Platforms
- Computer ScienceISPRS Int. J. Geo Inf.
- 2016
An extended hypergraph-based task-scheduling algorithm, named Hypergraph+, is proposed for massive spatial data processing and improves upon current hypergraph scheduling algorithms in two ways: It takes platform heterogeneity into consideration offering a metric function to evaluate the partitioning quality in order to derive the best task/file schedule.
References
SHOWING 1-10 OF 42 REFERENCES
New Worker-Centric Scheduling Strategies for Data-Intensive Grid Applications
- BusinessMiddleware
- 2007
This paper proposes a series of workercentric scheduling strategies for data-intensive applications and evaluates how each strategy performs compared to a task-centric one, showing that worker-centric strategies improve the performance in terms of makespan and bandwidth usage.
VIDAS: object-based virtualized data sharing for high performance storage I/O
- Computer ScienceScience Cloud '13
- 2013
With scientific computing in the cloud gaining popularity and using every time larger data sets, high performance storage I/O in virtualized environments is substantially increasing in importance.…
Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids
- Computer ScienceJSSPP
- 2004
Storage Affinity exploits a data reuse pattern, common on many data-intensive applications, that allows it to take data transfer delays into account and reduce the makespan of the application, and uses a replication strategy that yields efficient schedules without relying upon dynamic information that is difficult to obtain.
Parallel Programming Paradigms and Frameworks in Big Data Era
- Computer ScienceInternational Journal of Parallel Programming
- 2013
This paper discusses and analyzes opportunities and challenges for efficient parallel data processing, and reviews various parallel and distributed programming paradigms, analyzing how they fit into the Big Data era, and present modern emerging paradigm and frameworks.
HaLoop: Efficient Iterative Data Processing on Large Clusters
- Computer ScienceProc. VLDB Endow.
- 2010
HaLoop is presented, a modified version of the Hadoop MapReduce framework that is designed to serve iterative applications and dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms.
A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing
- Computer ScienceJSSPP
- 2006
A hypergraph based dynamic scheduling heuristic for a stream of independent I/O intensive jobs with file sharing behavior based on an event-driven, run-time hypergraph modeling of the file sharing characteristics among jobs is proposed.
Nephele: efficient parallel data processing in the cloud
- Computer ScienceMTAGS '09
- 2009
Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's compute clouds for both, task scheduling and execution and is presented as an ongoing research project.
An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing
- Computer Science
- 2012
This work proposes a heuristic task scheduling algorithm in which an initial task allocation will be produced at first, and then the job completion time can be reduced gradually by tuning the initial task assignment.
A new paradigm: Data-aware scheduling in grid computing
- Computer ScienceFuture Gener. Comput. Syst.
- 2009
BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing
- Computer Science2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
- 2011
A heuristic task scheduling algorithm called Balance-Reduce (BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocated, by taking a global view.