# Designing good MapReduce algorithms

@article{Ullman2012DesigningGM, title={Designing good MapReduce algorithms}, author={Jeffrey D. Ullman}, journal={XRDS}, year={2012}, volume={19}, pages={30-34} }

An introduction to designing algorithms for the MapReduce framework for parallel processing of big data.

## 39 Citations

### Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data

- Computer Science2013 IEEE International Conference on Big Data
- 2013

This study focuses on performance optimization of a MapReduce application, i.e., CloudRS, which tackles on the problem of detecting and removing errors in the next-generation sequencing de novo genomic data.

### MapReduce Algorithm for Single Source Shortest Path Problem

- Computer ScienceInternational Journal of Computer Network and Information Security
- 2020

This paper has proposed MR-DSMR, a Map reduce version of Dijkstra Strip-mined Relaxation (DSMR) algorithm and MR3-BFS algorithms, and compared the performance of both the algorithms with BFS.

### A Survey on Geographically Distributed Big-Data Processing Using MapReduce

- Computer ScienceIEEE Transactions on Big Data
- 2019

B batch processing, stream processing, MapReduce-based systems, and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues are classified and studied.

### RuleMR: Classification rule discovery with MapReduce

- Computer Science2014 IEEE International Conference on Big Data (Big Data)
- 2014

Experimental evaluations indicate that the proposed algorithm, namely RuleMR, not only scales well with respect to the size of the training dataset, but also, in many cases, the resulting model is comparable to many well known algorithms in matters of accuracy.

### Massive-scale processing of record-oriented and graph data

- Computer Science
- 2015

A theoretical framework for the MapReduce system is presented, to analyze the cost of distribution for different problems domains, and for evaluating the ``goodness'' of different algorithms, and a fundamental tradeoff between the parallelism and communication costs of algorithms is identified.

### Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors

- Computer ScienceSEA
- 2015

A constant approximation algorithm for generalizations of the Flexible Flow Shop FFS problem which form a realistic model for non-preemptive scheduling in MapReduce systems and improves substantially on the model proposed by Moseley et al.

### A Study of Hadoop: Structure and Performance Issues

- Computer Science
- 2015

The structure of Hadoop is studied and how its different components contribute to its performance are studied and some performance issues affectingHadoop are studied.

### Logical Aspects of Massively Parallel and Distributed Systems

- Computer SciencePODS
- 2016

The first part of the paper concerns massively parallel systems where computation proceeds in a number of synchronized rounds and the focus is on evaluation algorithms for conjunctive queries as well as on reasoning about correctness and optimization of such algorithms.

### Big Data Management Challenges, Approaches, Tools and their limitations

- Computer Science
- 2016

This chapter examines the main challenges involved in the three V's of Big Data, and provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data.

## References

SHOWING 1-10 OF 13 REFERENCES

### MapReduce: simplified data processing on large clusters

- Computer ScienceCACM
- 2008

This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

### Hadoop: The Definitive Guide

- Computer Science
- 2009

This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

### Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation

- Computer ScienceArXiv
- 2012

This is a vision paper that attempts to answer the questions described above about the ease of "map-reducability" - whether the problem can be partitioned into independent pieces, which are distributed across mappers/reducers.

### Fuzzy Joins Using MapReduce

- Computer Science2012 IEEE 28th International Conference on Data Engineering
- 2012

It is found that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered.

### Map-reduce extensions and recursive queries

- Computer ScienceEDBT/ICDT '11
- 2011

This work proposes several algorithmic ideas for efficient implementation of recursions in the map-reduce environment and discusses several alternatives for supporting recovery from failures without restarting the entire job.

### Counting triangles and the curse of the last reducer

- Computer ScienceWWW
- 2011

This work describes a sequential triangle counting algorithm and shows how to adapt it to the MapReduce setting, and presents a new algorithm designed specifically for the Map Reduce framework that achieves a factor of 10-100 speed up over the naive approach.

### SkewTune: mitigating skew in mapreduce applications

- Computer ScienceSIGMOD Conference
- 2012

The results show that SkewTune can significantly reduce job runtime in the presence of skew and adds little to no overhead in the absence of skew.

### Mining of Massive Datasets

- Computer Science
- 2014

Determining relevant data is key to delivering value from massive amounts of data and big data is defined less by volume which is a constantly moving target than by its ever-increasing variety, velocity, variability and complexity.

### Enumerating subgraph instances using map-reduce

- Computer Science2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013

This paper exploits the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round for the simplest sample graph, the triangle, and addresses the matter of optimizing computation cost.

### ON THE NUMBER OF SUBGRAPHS OF PRESCRIBED TYPE OF GRAPHS WITH A GIVEN NUMBER OF EDGES*

- Mathematics
- 2007

All graphs considered are finite, undirected, with no loops, no multiple edges and no isolated vertices. For a graph H=(V(H),E(H)) and for S C V(H) define N(S) = {x ~ V(H):xy E E(H) for some y E S}.…