Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling

  title={Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling},
  author={Rong-Xin Chen and Haibo Chen and Binyu Zang},
  journal={2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)},
  • Rong-Xin Chen, Haibo Chen, B. Zang
  • Published 11 September 2010
  • Computer Science
  • 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)
The prevalence of chip multiprocessor opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. MapReduce, a simple and elegant programming model to program large scale clusters, has recently been shown to be a promising alternative to harness the multicore platform. 

Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

This article argues that it is more efficient for MapReduce to iteratively process small chunks of data in turn than processing a large chunk of data at a time on shared memory multicore platforms and extends the general MapReduced with a “tiling strategy”, called Tiled-MapReduce (TMR).

On the Power of Combiner Optimizations in MapReduce Over MPI Workflows

  • Tao GaoYanfei Guo M. Taufer
  • Computer Science
    2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
  • 2018
The results with real datasets on the Tianhe-2 supercomputer prove that the proposed pipeline combiner workflow can reduce memory usage up to 51 % and improve the overall performance up to 61 %.

Glasswing: accelerating mapreduce on multi-core and many-core clusters

Glasswing is presented, a scalable MapReduce framework that employs a configurable mixture of coarse- and fine-grained parallelism to achieve high performance on multi-core CPUs and GPUs.

Applications and Evaluation of In-memory MapReduce

It is argued that in-memory storage can increase the flexibility of the MapReduce parallel programming model without requiring additional communication facilities to propagate data updates.

A MapReduce framework implementation for Network-on-Chip platforms

The proposed framework, which supports bare-metal systems, provides a scalable solution for data processing in a many-core system, while fully utilizing the platform's characteristics and achieving application speedup.

An In-Memory Framework for Extended MapReduce

The design and implementation of EMR, an in-memory framework for extended MapReduce, is described and used to illustrate the usage and performance of the framework, and measurements of typical Map Reduce applications are presented.

Local and Global Optimization of MapReduce Program Model

An adaptive load distribution scheme to balance the load on each node and consequently reduce across-node communication cost occurring in the Reduce function is developed and exploited to further reduce the communication cost with multi-core programming.

Peacock: a customizable MapReduce for multicore platform

This paper refine the workload characterization from Phoenix++ according to the attributes of key-value pairs, and gives a demonstration that the refined workload characterization model covers all classes of MapReduce workloads.

Decoupled MapReduce for Shared-Memory Multi-Core Architectures

This paper enhances the traditional MapReduce architecture by decoupling the map and combine phases in order to boost parallel execution, and demonstrates that the proposed solution achieves execution speedups of up to 2.46x compared to a state-of-the-art, shared-memory Map Reduce library.

MapReduce Architecture for a Single Computing Node of Multiprocessors

A new MapReduce framework called Hybrid-core based big Data (Real-time) Analysis (HYDRA) that regards a single node equipped with both multi-core CPUs and many-core GPUs as a cluster of nodes, where a single processor plays a role of a single nodes.



Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

This work optimizes Phoenix, a MapReduce runtime for shared-memory multi-cores and multiprocessors, on a quad-chip, 32-core, 256-thread UltraSPARC T2+ system with NUMA characteristics and shows how a multi-layered approach leads to significant speedup improvements with 256 threads.

MapReduce: simplified data processing on large clusters

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

Evaluating MapReduce for Multi-core and Multiprocessor Systems

It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.

Optimizing MapReduce for Multicore Architectures

A new MapReduce library is introduced, Metis, with a compromise data structure designed to perform well for most workloads, and experiments with the Phoenix benchmarks show that Metis’ data structure performs better than simpler alternatives, including Phoenix.

DryadInc: Reusing Work in Large-scale Computations

This work presents two incremental computation frameworks to reuse prior work in these circumstances: reusing identical computations already performed on data partitions, and computing just on the newly appended data and merging the new and previous results.

Map-reduce-merge: simplified relational data processing on large clusters

A Merge phase is added to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted by map and reduce modules, and it is demonstrated that this new model can express relational algebra operators as well as implement several join algorithms.

Improving MapReduce Performance in Heterogeneous Environments

A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

The design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance monitoring units (PMUs) available in today's processing units are described and reductions in cross-chip cache accesses are demonstrated.

MapReduce Online

A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well.

Mars: A MapReduce Framework on graphics processors

Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface, and is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.