Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment

@article{Liu2016EstimationAO,
  title={Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment},
  author={Qi Liu and Weidong Cai and Dandan Jin and Jian Shen and Zhangjie Fu and Xiaodong Liu and Nigel Linge},
  journal={Sensors (Basel, Switzerland)},
  year={2016},
  volume={16}
}
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve… 
Near-Data Prediction Based Speculative Optimization in a Distribution Environment
TLDR
An SE optimized strategy which can be used in prediction of near data and effectively improves the accuracy of alternative tasks and effects better in heterogeneous Hadoop environments in various situations, which is beneficial to consumers and cloud platform.
Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach
TLDR
A model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster is presented, and a novel heuristic method is designed, which significantly reduces the makespan of the jobs.
An Adaptively Speculative Execution Strategy Based on Real-Time Resource Awareness in a Multi-Job Heterogeneous Environment
TLDR
An adaptive SE strategy (ASE) is presented in Hadoop-2.6.0 and the performance of MRV2 is largely improved using the ASE strategy on job execution time and resource consumption, whether in a multi-job environment.
Estimating runtime of a job in Hadoop MapReduce
TLDR
A new method to estimate the runtime of a job by considering essential and efficient parameters that higher impact on runtime is proposed and the results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.
A Hadoop Yarn Scheduling Based on Node Computing Capability and Data Locality in Heterogeneous Environments
TLDR
A resource allocation algorithm based on node computing capability and data locality is proposed in this paper, which can effectively reduce the completion time and improve resource utilization in Hadoop.
ANN based execution time prediction model and assessment of input parameters through ISM
TLDR
An Artificial Neural Network (ANN) based prediction model is proposed to predict the execution time of tasks and provides 21.72% reduction in mean relative error compared to other state-of-the-art methods.
Toward Approximating Job Completion Time in Vehicular Clouds
TLDR
The main contribution of this paper is to offer easy-to-compute approximations of job completion time when estimates of the first or the first two moments of the intervening random variables are available.
A Distributed Parallel Algorithm Based on Low-Rank and Sparse Representation for Anomaly Detection in Hyperspectral Images
TLDR
This paper proposes a novel distributed parallel algorithm (DPA) by redesigning key operators ofLRASR in terms of MapReduce model to accelerate LRASR on cloud computing architectures and demonstrates that the newly developed DPA achieves very high speedups when accelerating LRASr, in addition to maintaining similar accuracies.
MLP-ANN-Based Execution Time Prediction Model and Assessment of Input Parameters Through Structural Modeling
TLDR
A multilayer perceptron–artificial neural network (MLP-ANN)-based prediction model is proposed to predict the execution time of tasks in cloud environment and provides 21.7% reduction in mean relative error compared to other state-of-the-art methods.
...
1
2
...

References

SHOWING 1-10 OF 47 REFERENCES
A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments
TLDR
This paper proposes a novel speculative execution strategy in heterogeneous environments, ERUL, to im-prove the estimation of tasks' rest time and indicates that, the Hadoop-ERUL strategy not only works more accurately in the estimate of running tasks' remaining execution time, but also reduces 26% job's running time compared to Hadoan-LATE.
Improving MapReduce Performance Using Smart Speculative Execution Strategy
TLDR
A new strategy, maximum cost performance (MCP), is developed which improves the effectiveness of speculative execution significantly and can run jobs up to 39 percent faster and improve the cluster throughput by up to 44 percent compared to Hadoop-0.21.
Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds
Maestro: Replica-Aware Map Scheduling for MapReduce
TLDR
This work proposes a novel scheduling algorithm for map tasks, named Maestro, to improve the overall performance of the MapReduce computation and achieves around 95% local map executions, reduces speculative map tasks by 80% and results in an improvement of up to 34% in the execution time.
Improving MapReduce Performance with Partial Speculative Execution
TLDR
This paper proposes the Partial Speculative Execution (PSE) strategy, a strategy to make speculative tasks start from the checkpoint of original tasks, which can eliminate the costs of re-reading, re-copying, and re-computing the processed data.
A Smart Strategy for Speculative Execution Based on Hardware Resource in a Heterogeneous Distributed Environment
TLDR
Some pitfalls in proposed strategy have been modified and computer hardware has been taken into consideration (HWC-Speculation) in Hadoop-2.6 and results show that the method can find a slow task correctly and the performance of MRV2 is improved.
Improving MapReduce Performance in Heterogeneous Environments
TLDR
A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications
TLDR
This paper proposes two heuristic algorithms, called energy-aware MapReduce scheduling algorithms (EMRSA-I and EMRSA-II), that find the assignments of map and reduce tasks to the machine slots in orderto minimize the energy consumed when executing the application.
A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop
TLDR
In this paper, a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop is designed, which can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time.
...
1
2
3
4
5
...