# On the Computational Complexity of MapReduce

@inproceedings{Fish2015OnTC,
title={On the Computational Complexity of MapReduce},
author={Benjamin Fish and Jeremy Kun and {\'A}d{\'a}m D{\'a}niel Lelkes and L. Reyzin and Gy{\"o}rgy Tur{\'a}n},
booktitle={DISC},
year={2015}
}
In this paper we study the MapReduce Class MRC defined by Karloffi?źeti?źal., which is a formal complexity-theoretic model of MapReduce. We show that constant-round MRC computations can decide regular languages and simulate sublogarithmic space-bounded Turing machines. In addition, we prove hierarchy theorems for MRC under certain complexity-theoretic assumptions. These theorems show that sufficiently increasing the number of rounds or the amount of time per processor strictly increases the… Expand
20 Citations

#### Topics from this paper

Algorithms and Complexity Results for Learning and Big Data
The complexity-theoretic properties of MapReduce, one of the most ubiquitous distributed computing frameworks for big data, are explored, new algorithms are given and computational hardness results for a model of clustering are given, and the issue of fairness in machine learning is addressed. Expand
Efficient Circuit Simulation in MapReduce
• Mathematics, Computer Science
• ISAAC
• 2019
This work closely relates the standard, uniform NC hierarchy modeling parallel computations to the deterministic MapReduce hierarchy DMRC by proving that NC^(i+1) is contained in DMRC^i for all natural i, including 0. Expand
Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation)
• Computer Science
• J. ACM
• 2018
An abstract model of massively parallel computation, where essentially the only restrictions are that the “fan-in” of each machine is limited to s bits, where s is smaller than the input size n, and that computation proceeds in synchronized rounds. Expand
Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation)
• Computer Science
• SPAA
• 2016
An abstract model of massively parallel computation, where essentially the only restrictions are that the "fan-in" of each machine is limited to s bits, and that computation proceeds in synchronized rounds, which is proved to be the best one could hope for. Expand
A Conditional Lower Bound on Graph Connectivity in MapReduce
• Mathematics, Computer Science
• ArXiv
• 2019
Some of the first lower bounds for a natural graph problem, albeit for a restricted class of algorithms, are introduced for the basic question of determining if a graph is connected or not. Expand
On the Hardness of Massively Parallel Computation
• Computer Science, Mathematics
• SPAA
• 2020
The existence of hard functions that are essentially not parallelizable in the MPC model is shown by adapting the so-called compression argument from the data structure lower bounds and cryptography literature to the context of massively parallel computation. Expand
DFA Minimization Algorithms in Map-Reduce
Map-Reduce has been a highly popular parallel-distributed programming model. In this thesis, we study the problem of minimizing Deterministic Finite State Automata (DFA). We focus our attention onExpand
A scalable and composable map-reduce system
• Computer Science
• 2016 IEEE International Conference on Big Data (Big Data)
• 2016
This paper presents a novel map-reduce runtime system that is designed for scalability and for composition with other parallel software, built over the Cilk programming language and outperforms Phoenix++, by 1.5x–4x for 5 out of 7 map- reduce benchmarks on 48 threads. Expand
Security and privacy aspects in MapReduce on clouds: A survey
• Computer Science
• Comput. Sci. Rev.
• 2016
This paper investigates and discusses security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce, and provides a review of existingSecurity and privacy protocols for MapReduced and discusses their overhead issues. Expand
Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances
• Mathematics, Computer Science
• ICML
• 2018
Efficiency of implementation of the algorithms in Apache Spark is demonstrated and constant-factor inapproximability results for $o(\log n)$-round algorithms under standard MPC hardness assumptions are shown. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES
A model of computation for MapReduce
• Computer Science
• SODA '10
• 2010
A simulation lemma is proved showing that a large class of PRAM algorithms can be efficiently simulated via MapReduce, and it is demonstrated how algorithms can take advantage of this fact to compute an MST of a dense graph in only two rounds. Expand
Sorting, Searching, and Simulation in the MapReduce Framework
• Computer Science
• ISAAC
• 2011
This work designs optimal simulations of the the well-established PRAM and BSP models in MapReduce, immediately resulting in optimal solutions to the problems of computing fixed-dimensional linear programming and 2-D and 3-D convex hulls. Expand
Upper and Lower Bounds on the Cost of a Map-Reduce Computation
• Computer Science
• Proc. VLDB Endow.
• 2013
A model of problems that can be solved in a single round of map-reduce computation is introduced and enables a generic recipe for discovering lower bounds on communication cost as a function of the maximum number of inputs that could be assigned to one reducer. Expand
• L. Fortnow
• Computer Science, Mathematics
• J. Comput. Syst. Sci.
• 2000
It is shown that SAT cannot be solved in n1+o(1) time and n1?? space for any general random-access nondeterministic Turing machines, and lower bounds for log-space uniform NC1 circuits and branching programs are given. Expand
BSP vs MapReduce
This work links MapReduce to the BSP model of computation, underlining the relevance of BSP to modern parallel algorithm design and deﬁning a subclass of B SP algorithms that can be efficiently implemented in Map Reduce. Expand
MapReduce: Simplified Data Processing on Large Clusters
• Computer Science
• OSDI
• 2004
This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable. Expand
On distributing symmetric streaming computations
• Computer Science
• SODA '08
• 2008
A simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by Google's MapReduce and Apache's Hadoop, and it is shown that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms. Expand
Which problems have strongly exponential complexity?
• Computer Science, Mathematics
• Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)
• 1998
For several NP-complete problems, there have been a progression of better but still exponential algorithms. In this paper we address the relative likelihood of sub-exponential algorithms for theseExpand
Turing Machines with Sublogarithmic Space
Lower bounds for accepting non-regular languages and space constructible functions are lowered and deterministic versus nondeterministic Turing machines are considered. Expand
Time-Space Tradeoffs for Counting NP Solutions Modulo Integers
• Richard Ryan Williams
• Computer Science, Mathematics
• Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)
• 2007
It is proved that for all primes p except for possibly one of them, and for all c < 2 cos(π/7) ≈ 1.801, there is a d > 0 such that MODp-Sat is not solvable in nc time and nd space by general algorithms, which is the first time-space tradeoffs for counting the number of solutions to an NP problem modulo small integers. Expand