Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems
@article{Zhang2016DataawareTS, title={Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems}, author={YiFan Zhang and Yu-Chu Tian and Colin J. Fidge and Wayne Kelly}, journal={J. Parallel Distributed Comput.}, year={2016}, volume={93-94}, pages={87-101} }
14 Citations
Optimal Data File Allocation for All-to-All Comparison in Distributed System : A Case Study on Genetic Sequence Comparison 201
- Computer Science
- 2019
The results show that the proposed file allocation strategy can achieve the basic load balance of each node in the distributed system without exceeding the storage capacity of any node, and completely localize the data file.
Hypergraph+: An Improved Hypergraph-Based Task-Scheduling Algorithm for Massive Spatial Data Processing on Master-Slave Platforms
- Computer ScienceISPRS Int. J. Geo Inf.
- 2016
An extended hypergraph-based task-scheduling algorithm, named Hypergraph+, is proposed for massive spatial data processing and improves upon current hypergraph scheduling algorithms in two ways: It takes platform heterogeneity into consideration offering a metric function to evaluate the partitioning quality in order to derive the best task/file schedule.
Algorithm for deadline based task scheduling in heterogeneous grid environment
- Computer Science2016 2nd International Conference on Next Generation Computing Technologies (NGCT)
- 2016
The computational results by proposed IDSA for Non-delayed tasks are higher than the EDF and PDSA respectively at 4000 number of tasks and shows IDSA is more suitable scheduling algorithm for grid computing.
A comparative analysis of resource allocation schemes for real-time services in high-performance computing systems
- Computer ScienceInt. J. Distributed Sens. Networks
- 2020
This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems.
A Data-aware MultiWorkflow Scheduler for Clusters on WorkflowSim
- Computer ScienceCOMPLEXIS
- 2017
A multiworkflow store-aware scheduler policy as an extension of WorkflowSim, enabling its combination with other workflowSim scheduling policies and the possibility of evaluating a wide range of storage and file allocation possibilities.
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
- Computer ScienceSC20: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2020
This work presents a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization.
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
- Computer ScienceSC
- 2020
This work presents a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization.
Hard Real-Time Task Scheduling in Cloud Computing Using an Adaptive Genetic Algorithm
- Computer ScienceComput.
- 2017
A greedy and a genetic algorithm with an adaptive selection of suitable crossover and mutation operations (named as AGA) to allocate and schedule real-time tasks with precedence constraint on heterogamous virtual machines is proposed.
Opposition-based learning inspired particle swarm optimization (OPSO) scheme for task scheduling problem in cloud computing
- Computer ScienceJ. Ambient Intell. Humaniz. Comput.
- 2021
The proposed task scheduling mechanism based on particle swarm optimization (PSO) in which opposition-based learning technique is used to avoid premature convergence and to accelerate the convergence of standard PSO is compared with the well-established task scheduling strategies based on PSO, mPSO (modified PSO), genetic algorithm GA, max–min, minimum completion time and minimum execution time.
Prediction-based Resource Allocation Model for Real-time Tasks
- Computer Science2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS)
- 2018
A prediction-based model which analyze tasks feasibility before scheduling on the HPC resources when tasks have data-intensive constraints is proposed to save time by refraining further analysis on non-schedulable tasks.
References
SHOWING 1-10 OF 34 REFERENCES
Distributed computing of all-to-all comparison problems in heterogeneous systems
- Computer ScienceIECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society
- 2015
A scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems that not only saves storage space but also achieves load balancing and good data locality for all comparison tasks.
A distributed computing framework for All-to-All comparison problems
- Computer ScienceIECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society
- 2014
A distributed computing framework is presented for high performance computing of All-to-All Comparison Problems and a data distribution strategy is embedded in the framework for reduced storage space and balanced computing load.
Scheduling Precedence Constrained Stochastic Tasks on Heterogeneous Cluster Systems
- Computer Science, BusinessIEEE Transactions on Computers
- 2015
It is proved that the expected makespan of scheduling stochastic tasks is greater than or equal to the makes pan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values.
Data Replication Approach with Consistency Guarantee for Data Grid
- Computer ScienceIEEE Transactions on Computers
- 2014
This paper proposes a new quorum-based data replication protocol with the objectives of minimizing the data update cost, providing high availability and data consistency, and compares the proposed approach with two existing approaches using response time,Data consistency, data availability, and communication costs.
From the Cloud to the Atmosphere: Running MapReduce across Data Centers
- Computer ScienceIEEE Transactions on Computers
- 2014
G-MR is introduced, a system for executing sequences of MapReduce jobs on geo-distributed data sets, which implements the optimization framework, and evaluations show that using G-MR significantly improves processing time and cost for geodistributed data set.
Preemptive Hadoop Jobs Scheduling under a Deadline
- Computer Science, Business2012 Eighth International Conference on Semantics, Knowledge and Grids
- 2012
To the knowledge, the first real preemptive job scheduler to meet deadlines on Hadoop is implemented, and the experimental results indicate that the preemptive scheduling approach is promising, which is more efficient than the non-preemptive one for executing jobs under a certain deadline.
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
- Computer Science2008 IEEE Fourth International Conference on eScience
- 2008
A high performance workflow system MRGIS is proposed, a parallel and distributed computing platform based on MapReduce clusters, to execute GIS applications efficiently and can significantly improve the performance of GIS workflow execution.
Cloud Technologies for Bioinformatics Applications
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2011
This paper's experience in applying two cloud technologies Apache Hadoop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics is presented and a comparison of performance of the cloud technologies under virtual and nonvirtual hardware platforms is presented.
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2010
This work argues that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads and presents one example of an abstraction-All-Pairs-that fits the needs of several applications in biometrics, bioinformatics, and data mining.
Load Scheduling Strategies for Parallel DNA Sequencing Applications
- Business2009 11th IEEE International Conference on High Performance Computing and Communications
- 2009
Through simulation and numerical analysis, this study demonstrates that for a constant sequence length as the numbers of processors increase in the network the processing time for the job decreases and minimum overall processing time is achieved.