MOON: MapReduce On Opportunistic eNvironments

  title={MOON: MapReduce On Opportunistic eNvironments},
  author={Heshan Lin and Xiaosong Ma and Jeremy S. Archuleta and Wu-chun Feng and Mark K. Gardner and Zhe Zhang},
  booktitle={HPDC '10},
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. [] Key Method MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system…

Figures and Tables from this paper

Scaling Hadoop clusters with virtualized volunteer computing environment

  • E. KijsipongseS. U.-ruekolan
  • Computer Science
    2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)
  • 2014
A Hadoop cluster that can be scaled into virtualized Volunteer Computing environment, which consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster.

Towards MapReduce for Desktop Grid Computing

This paper presents the architecture of the prototype of the MapReduce programming model based on Bit Dew, a middleware for large scale data management on Desktop Grid, and describes the set of features which makes this approach suitable for large size and loosely connected Internet Desktop Grid.

Adoop: MapReduce for ad-hoc cloud computing

This paper investigates how Hadoop --the most widely used open-source implementation of MapReduce-- can be optimized to run efficiently in ad-hoc cloud environments, despite the challenges these environments impose, and presents Adoop: a history-based scheduling approach to Map reduce, where the availability history of each node affects Hadoops scheduling decisions.

P2P-MapReduce: Parallel data processing in dynamic Cloud environments

Internet-scale support for map-reduce processing

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

Addressing Data-Intensive Computing Problems with the Use of MapReduce on Heterogeneous Environments as Desktop Grid on Slow Links

The motivation of this work is to apply the data-intensive computing heterogeneous environments as desktop grid with use MapReduce model, and the MR-A++ model creates a training task to gather information prior to the distribution of data.

Internet-scale support for map-reduce

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

Volunteer Cloud Computing: MapReduce over the Internet

A BOINC prototype is created that can run MapReduce jobs (BOINC-MR), using a pull-model in which communication is always initiated by the client, using a programming paradigm that has been significantly popular and is used by several systems on the cloud.

Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce

A new experimental framework which emulates key fundamental aspects of Internet Desktop Grid is proposed and BitDew-MR outperforms Hadoop performances on several aspects: scalability, fairness, resilience to node failures, and network disconnections.

HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers

It is demonstrated in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform, and a 2-phase hierarchical scheduler, called HybridMR, is proposed for the effective resource management of interactive and batch workloads.



Improving MapReduce Performance in Heterogeneous Environments

A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.


This paper proposes a new type of system named Hadoop++: it boosts task performance without changing the Hadooper framework at all (Hadoop does not even 'notice it'), and shows the superiority of Hadoo++ over both Hadoops and HadoOPDB for tasks related to indexing and join processing.

MapReduce: simplified data processing on large clusters

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and networkvirtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment.

BitDew: A programmable environment for large-scale data management and distribution

  • G. FedakHaiwu HeF. Cappello
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
The Bitdew framework is proposed, a programmable environment for automatic and transparent data management on computational Desktop Grids, and the performance evaluation demonstrates that the high level of abstraction and transparency is obtained with a reasonable overhead, while offering the benefit of scalability, performance and fault tolerance with little programming cost.

FreeLoader: Scavenging Desktop Storage Resources for Scientific Data

The experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities, and novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth are presented.

Entropia: architecture and performance of an enterprise desktop grid system

On Availability of Intermediate Data in Cloud Computations

A renewed look at the problem of managing intermediate data that is generated during dataflow computations within clouds (e.g., MapReduce, Pig, Dryad, etc.) within clouds and presents design ideas for a new intermediate data storage system.

BOINC: a system for public-resource computing and storage

  • D. Anderson
  • Computer Science
    Fifth IEEE/ACM International Workshop on Grid Computing
  • 2004
The goals of BOINC are described, the design issues that were confronted, and the solutions to these problems are described.

Replication degree customization for high availability

This paper discovers that the optimal replication degree of an object should be linear in the logarithm of its popularity-to-size ratio, and proposes an object replication degree customization scheme that maximizes the expected service availability under given object request probabilities, object sizes, and space constraints.