Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems

@article{Zhang2016DataawareTS,
  title={Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems},
  author={YiFan Zhang and Yu-Chu Tian and Colin J. Fidge and Wayne Kelly},
  journal={J. Parallel Distributed Comput.},
  year={2016},
  volume={93-94},
  pages={87-101}
}
Optimal Data File Allocation for All-to-All Comparison in Distributed System : A Case Study on Genetic Sequence Comparison 201
TLDR
The results show that the proposed file allocation strategy can achieve the basic load balance of each node in the distributed system without exceeding the storage capacity of any node, and completely localize the data file.
Hypergraph+: An Improved Hypergraph-Based Task-Scheduling Algorithm for Massive Spatial Data Processing on Master-Slave Platforms
TLDR
An extended hypergraph-based task-scheduling algorithm, named Hypergraph+, is proposed for massive spatial data processing and improves upon current hypergraph scheduling algorithms in two ways: It takes platform heterogeneity into consideration offering a metric function to evaluate the partitioning quality in order to derive the best task/file schedule.
Algorithm for deadline based task scheduling in heterogeneous grid environment
TLDR
The computational results by proposed IDSA for Non-delayed tasks are higher than the EDF and PDSA respectively at 4000 number of tasks and shows IDSA is more suitable scheduling algorithm for grid computing.
A comparative analysis of resource allocation schemes for real-time services in high-performance computing systems
TLDR
This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems.
A Data-aware MultiWorkflow Scheduler for Clusters on WorkflowSim
TLDR
A multiworkflow store-aware scheduler policy as an extension of WorkflowSim, enabling its combination with other workflowSim scheduling policies and the possibility of evaluating a wide range of storage and file allocation possibilities.
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
TLDR
This work presents a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization.
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
TLDR
This work presents a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization.
Hard Real-Time Task Scheduling in Cloud Computing Using an Adaptive Genetic Algorithm
TLDR
A greedy and a genetic algorithm with an adaptive selection of suitable crossover and mutation operations (named as AGA) to allocate and schedule real-time tasks with precedence constraint on heterogamous virtual machines is proposed.
Opposition-based learning inspired particle swarm optimization (OPSO) scheme for task scheduling problem in cloud computing
TLDR
The proposed task scheduling mechanism based on particle swarm optimization (PSO) in which opposition-based learning technique is used to avoid premature convergence and to accelerate the convergence of standard PSO is compared with the well-established task scheduling strategies based on PSO, mPSO (modified PSO), genetic algorithm GA, max–min, minimum completion time and minimum execution time.
Prediction-based Resource Allocation Model for Real-time Tasks
TLDR
A prediction-based model which analyze tasks feasibility before scheduling on the HPC resources when tasks have data-intensive constraints is proposed to save time by refraining further analysis on non-schedulable tasks.
...
1
2
...

References

SHOWING 1-10 OF 34 REFERENCES
Distributed computing of all-to-all comparison problems in heterogeneous systems
TLDR
A scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems that not only saves storage space but also achieves load balancing and good data locality for all comparison tasks.
A distributed computing framework for All-to-All comparison problems
TLDR
A distributed computing framework is presented for high performance computing of All-to-All Comparison Problems and a data distribution strategy is embedded in the framework for reduced storage space and balanced computing load.
Scheduling Precedence Constrained Stochastic Tasks on Heterogeneous Cluster Systems
TLDR
It is proved that the expected makespan of scheduling stochastic tasks is greater than or equal to the makes pan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values.
Data Replication Approach with Consistency Guarantee for Data Grid
TLDR
This paper proposes a new quorum-based data replication protocol with the objectives of minimizing the data update cost, providing high availability and data consistency, and compares the proposed approach with two existing approaches using response time,Data consistency, data availability, and communication costs.
From the Cloud to the Atmosphere: Running MapReduce across Data Centers
TLDR
G-MR is introduced, a system for executing sequences of MapReduce jobs on geo-distributed data sets, which implements the optimization framework, and evaluations show that using G-MR significantly improves processing time and cost for geodistributed data set.
Preemptive Hadoop Jobs Scheduling under a Deadline
  • Li Liu, Yuan Zhou, Qianru Wang
  • Computer Science, Business
    2012 Eighth International Conference on Semantics, Knowledge and Grids
  • 2012
TLDR
To the knowledge, the first real preemptive job scheduler to meet deadlines on Hadoop is implemented, and the experimental results indicate that the preemptive scheduling approach is promising, which is more efficient than the non-preemptive one for executing jobs under a certain deadline.
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
TLDR
A high performance workflow system MRGIS is proposed, a parallel and distributed computing platform based on MapReduce clusters, to execute GIS applications efficiently and can significantly improve the performance of GIS workflow execution.
Cloud Technologies for Bioinformatics Applications
TLDR
This paper's experience in applying two cloud technologies Apache Hadoop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics is presented and a comparison of performance of the cloud technologies under virtual and nonvirtual hardware platforms is presented.
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
TLDR
This work argues that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads and presents one example of an abstraction-All-Pairs-that fits the needs of several applications in biometrics, bioinformatics, and data mining.
Load Scheduling Strategies for Parallel DNA Sequencing Applications
TLDR
Through simulation and numerical analysis, this study demonstrates that for a constant sequence length as the numbers of processors increase in the network the processing time for the job decreases and minimum overall processing time is achieved.
...
1
2
3
4
...