Towards MapReduce for Desktop Grid Computing

  title={Towards MapReduce for Desktop Grid Computing},
  author={Bing Tang and Mircea Moca and Stephane Chevalier and Haiwu He and Gilles Fedak},
  journal={2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing},
  • B. Tang, M. Moca, G. Fedak
  • Published 4 November 2010
  • Computer Science
  • 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing
MapReduce is an emerging programming model for data-intense application proposed by Google, which has attracted a lot of attention recently. MapReduce borrows from functional programming, where programmer defines Map and Reduce tasks executed on large set of distributed data. In this paper we propose an implementation of the MapReduce programming model. We present the architecture of the prototype based on Bit Dew, a middleware for large scale data management on Desktop Grid. We describe the… 

Figures and Tables from this paper

Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce

A new experimental framework which emulates key fundamental aspects of Internet Desktop Grid is proposed and BitDew-MR outperforms Hadoop performances on several aspects: scalability, fairness, resilience to node failures, and network disconnections.

Distributed Results Checking for MapReduce in Volunteer Computing

  • M. MocaG. SilaghiG. Fedak
  • Computer Science
    2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
  • 2011
A distributed result checker based on the Majority Voting method is presented and the efficiency of this approach is evaluated using a model for characterizing errors and sabotage in the MapReduce paradigm.

SCADAMAR: scalable and data-efficient internet MapReduce

This work presents a computing platform called SCADAMAR that runs MapReduce jobs over the Internet and provides two new main contributions that improves data distribution by using the BitTorrent protocol to distribute all data and improves intermediate data availability.

Accelerating MapReduce on Commodity Clusters: An SSD-Empowered Approach

MapReduce, as a programming model and implementation for processing large data sets on clusters with hundreds or thousands of nodes, has gained wide adoption. In spite of the fact, we found that

Internet-scale support for map-reduce processing

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

Internet-scale support for map-reduce

VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.

P2P-MapReduce: Parallel data processing in dynamic Cloud environments

Improving performance of hadoop clusters

This dissertation addresses the problem of how to place data across nodes in a way that each node has a balanced data processing load, and proposes a preshuffling algorithm to preprocess intermediate data between the map and reduce stages, thereby increasing the throughput of Hadoop clusters.

D3-MapReduce: Towards MapReduce for Distributed and Dynamic Data Sets

The ambition of D3-MapReduce is to extend the MapReduce programming model and propose efficient implementation of this model to cope with distributed data sets that span over multiple distributed infrastructures or stored on network of loosely connected devices.

Mapreduce Challenges on Pervasive Grids

This study presents the advances on designing and implementing scalable techniques to support the development and execution of MapReduce application in pervasive distributed computing infrastructures, in the context of the PER-MARE project, and proposes two complementary approaches: improving the Apache Hadoop middleware by including context-awareness and fault-tolerance features and providing an alternative pervasive grid implementation, fully adapted to dynamic environments.



MOON: MapReduce On Opportunistic eNvironments

Tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.

Evaluating MapReduce for Multi-core and Multiprocessor Systems

It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.

MapReduce: simplified data processing on large clusters

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

Mars: A MapReduce Framework on graphics processors

Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface, and is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.

Towards Efficient MapReduce Using MPI

This implementation combines redistribution and reduce and moves them into the network and helps applications with a limited number of output keys in the map phase to more efficiently support all MapReduce applications.

Mars: Accelerating MapReduce with Graphics Processors

The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications.

Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach

This work proposes a generic, yet efficient data storage which enables the use of Desktop Grids for applications with high output data requirements, where the access grain and the access patterns may be random.

Towards efficient data distribution on computational desktop grids with BitTorrent

Distributed Data Mining using a Public Resource Computing Framework

This work describes how a Java prototype of the framework was used to tackle the problem of mining frequent itemsets from a transactional dataset, and shows some preliminary yet interesting performance results that prove the efficiency improvements that can derive from the presented architecture.

Optimizing Data Distribution in Desktop Grid Platforms

An approach based on the popular BitTorrent protocol; and a Content Delivery Network approach are proposed for Berkeley Open Infrastructure for Network Computing, with preliminary results indicating that the BitTorrent client had a negligible influence on the BOINC client's computation time.