A study on using uncertain time series matching algorithms for MapReduce applications

@article{Rizvandi2013ASO,
  title={A study on using uncertain time series matching algorithms for MapReduce applications},
  author={Nikzad Babaii Rizvandi and Javid Taheri and Reza Moraveji and Albert Y. Zomaya},
  journal={Concurrency and Computation: Practice and Experience},
  year={2013},
  volume={25}
}
In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference… 

On Modeling CPU Utilization of MapReduce Applications

This approach aims to eliminate error-prone manual processes and presents a fully automated solution to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework.

On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments

An approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment is presented and the accuracy of the models used are validated using three realistic applications (WordCount, Exim MainLog parsing, and TeraSort).

Statistical Regression to Predict Total Cumulative CPU Usage of MapReduce Jobs

This paper presents an approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment, and validates the accuracy of the models using three realistic applications (WordCount, Exim MainLog parsing, and TeraSort).

Network Load Analysis and Provisioning of MapReduce Applications

This paper studies the dependency between MapReduce configuration parameters and network load of fixed-size Map Reduce jobs during the shuffle phase, then proposes an analytical method to model this dependency, which is modeled by multivariate linear regression.

Process Mining Monitoring for Map Reduce Applications in the Cloud

A distributed architecture is introduced where a logic-based monitor is able to detect possible delays, and trigger recovery actions such as the dynamic provisioning of further resources, where resources are needed to meet the deadlines.

A Hybrid Approach for Clustering Uncertain Time Series

The experimental results show that, compared with the traditional UK-means clustering algorithm, the Adjusted Rand Index of the clustering results have an obviously higher accuracy and the time efficiency of the hybrid clustering approach is significantly improved.

Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines

This work proposes a policy for dynamic provisioning of Cloud resources to speed up execution of deadline-constrained MapReduce applications, by enabling concurrent execution of tasks, in order to meet a deadline for completion of the Map phase of the application.

MapReduce over the Hybrid Cloud: A Novel Infrastructure Management Policy

This work proposes HyMR, a policy to enable autonomic cloud bursting for clusters of virtual machines operating MapReduce jobs over a hybrid cloud, and shows that HyMR policy allows the user to significantly reduce the data-processing time.

A Multi-step-ahead CPU Load Prediction Approach in Distributed System

This paper uses multiple fixed length immediately preceding history sequences to obtain the change pattern prediction and shows its approach was more accurate than the approach of repeating one-step-ahead prediction to make the multi-step -ahead prediction, which is widely adopted in industry.

References

SHOWING 1-10 OF 54 REFERENCES

On Using Pattern Matching Algorithms in MapReduce Applications

This paper studies CPU utilization time patterns of several MapReduce applications to evaluate the hypothesis in tweaking system parameters in executing similar applications, and results showed effectiveness of the approach on pseudo-distributed Map Reduce platforms.

Preliminary Results on Using Matching Algorithms in Map-Reduce Applications

This paper studies CPU utilization time patterns of several Map-Reduce applications and proposes a hypothesis to classify applications under similar CPU utilization patterns, which shows effectiveness of the approach on pseudo-distributed Map- Reduce platforms.

Preliminary Results on Modeling CPU Utilization of MapReduce Programs

An automated model generation procedure can effectively characterise the CPU resource of applications when they are running on MapReduce with average prediction error of 3.5% and 2.75%, respectively.

On Modeling CPU Utilization of MapReduce Applications

This approach aims to eliminate error-prone manual processes and presents a fully automated solution to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework.

On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments

An approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment is presented and the accuracy of the models used are validated using three realistic applications (WordCount, Exim MainLog parsing, and TeraSort).

MapReduce Implementation of Prestack Kirchhoff Time Migration (PKTM) on Seismic Data

This paper gives an overview of forward/inverse Prestack Kirchhoff Time Migration algorithm, as one of the well-known seismic imaging algorithms, and proposes an approach to fit this algorithm for running on Google's MapReduce framework.

On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time

An analytical method to model the dependency between configuration parameters and total execution time of Map-Reduce applications by multivariate linear regression is proposed.

Network Load Analysis and Provisioning of MapReduce Applications

This paper studies the dependency between MapReduce configuration parameters and network load of fixed-size Map Reduce jobs during the shuffle phase, then proposes an analytical method to model this dependency, which is modeled by multivariate linear regression.

Using realistic simulation for performance analysis of mapreduce setups

The design of an accurate MapReduce simulator, MRPerf, is presented, which can serve as a design tool for Map Reduce infrastructure, and as a planning tool for making Map reduce deployment far easier via reduction in the number of parameters that currently have to be hand-tuned using rules of thumb.

A simulation approach to evaluating design decisions in MapReduce setups

The resulting simulator, MRPerf, captures such aspects of MapReduce setups as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance and can serve as a tool for optimizing existing MapReduces setups as well as designing new ones.
...