MOON: MapReduce On Opportunistic eNvironments

  title={MOON: MapReduce On Opportunistic eNvironments},
  author={Heshan Lin and Xiaosong Ma and Jeremy S. Archuleta and Wu-chun Feng and Mark K. Gardner and Zhe Zhang},
  booktitle={HPDC '10},
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. [] Key Method MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system…

Figures and Tables from this paper

Scaling Hadoop clusters with virtualized volunteer computing environment

  • E. KijsipongseS. U.-ruekolan
  • Computer Science
    2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)
  • 2014
A Hadoop cluster that can be scaled into virtualized Volunteer Computing environment, which consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster.

Towards MapReduce for Desktop Grid Computing

This paper presents the architecture of the prototype of the MapReduce programming model based on Bit Dew, a middleware for large scale data management on Desktop Grid, and describes the set of features which makes this approach suitable for large size and loosely connected Internet Desktop Grid.

Adoop: MapReduce for ad-hoc cloud computing

This paper investigates how Hadoop --the most widely used open-source implementation of MapReduce-- can be optimized to run efficiently in ad-hoc cloud environments, despite the challenges these environments impose, and presents Adoop: a history-based scheduling approach to Map reduce, where the availability history of each node affects Hadoops scheduling decisions.

P2P-MapReduce: Parallel data processing in dynamic Cloud environments

Internet-scale support for map-reduce processing

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

Addressing Data-Intensive Computing Problems with the Use of MapReduce on Heterogeneous Environments as Desktop Grid on Slow Links

The motivation of this work is to apply the data-intensive computing heterogeneous environments as desktop grid with use MapReduce model, and the MR-A++ model creates a training task to gather information prior to the distribution of data.

A case for MapReduce over the internet

This paper investigates real-world scenarios in which MapReduce programming model and specifically Hadoop framework could be used for processing large-scale, geographically scattered datasets and proposes and evaluates extensions to Hadoops MapReduced framework to improve its performance in such environments.

Internet-scale support for map-reduce

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce

A new experimental framework which emulates key fundamental aspects of Internet Desktop Grid is proposed and BitDew-MR outperforms Hadoop performances on several aspects: scalability, fairness, resilience to node failures, and network disconnections.

Improving performance of hadoop clusters

This dissertation addresses the problem of how to place data across nodes in a way that each node has a balanced data processing load, and proposes a preshuffling algorithm to preprocess intermediate data between the map and reduce stages, thereby increasing the throughput of Hadoop clusters.



Improving MapReduce Performance in Heterogeneous Environments

A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.


This paper proposes a new type of system named Hadoop++: it boosts task performance without changing the Hadooper framework at all (Hadoop does not even 'notice it'), and shows the superiority of Hadoo++ over both Hadoops and HadoOPDB for tasks related to indexing and join processing.

MapReduce: simplified data processing on large clusters

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and networkvirtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment.

Introducing map-reduce to high end computing

This work provides an example of how the halo finding application, when applied to large astrophysics datasets, benefits from the model of the Hadoop architecture.

BitDew: A programmable environment for large-scale data management and distribution

  • G. FedakHaiwu HeF. Cappello
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
The Bitdew framework is proposed, a programmable environment for automatic and transparent data management on computational Desktop Grids, and the performance evaluation demonstrates that the high level of abstraction and transparency is obtained with a reasonable overhead, while offering the benefit of scalability, performance and fault tolerance with little programming cost.

FreeLoader: Scavenging Desktop Storage Resources for Scientific Data

The experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities, and novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth are presented.

On Availability of Intermediate Data in Cloud Computations

A renewed look at the problem of managing intermediate data that is generated during dataflow computations within clouds (e.g., MapReduce, Pig, Dryad, etc.) within clouds and presents design ideas for a new intermediate data storage system.

BOINC: a system for public-resource computing and storage

  • D. Anderson
  • Computer Science
    Fifth IEEE/ACM International Workshop on Grid Computing
  • 2004
The goals of BOINC are described, the design issues that were confronted, and the solutions to these problems are described.

Replication degree customization for high availability

This paper discovers that the optimal replication degree of an object should be linear in the logarithm of its popularity-to-size ratio, and proposes an object replication degree customization scheme that maximizes the expected service availability under given object request probabilities, object sizes, and space constraints.