Query optimization for massively parallel data processing

  title={Query optimization for massively parallel data processing},
  author={Sai Wu and Feng Li and Sharad Mehrotra and Beng Chin Ooi},
MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. Some vendors have enhanced their data warehouse systems by integrating MapReduce into the systems. However, existing MapReduce-based query processing systems, such as Hive, fall short of the query optimization and competency of conventional database systems. Given an… CONTINUE READING
Highly Influential
This paper has highly influenced 11 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 283 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 95 extracted citations

Scalable Query Optimization for Efficient Data Processing Using MapReduce

BigData Congress 2015 • 2015
View 8 Excerpts
Highly Influenced

JOMR: Multi-join optimizer technique to enhance map-reduce job

2014 9th International Conference on Informatics and Systems • 2014
View 10 Excerpts
Highly Influenced

Multi-Query Optimization in MapReduce Framework

View 11 Excerpts
Highly Influenced

283 Citations

Citations per Year
Semantic Scholar estimates that this publication has 283 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-4 of 4 references

Running tpc-h queries on hive

Y. Jia
http://issues.apache.org/jira/browse/HIVE-600, • 2009
View 4 Excerpts
Highly Influenced

Hive - a petabyte scale data warehouse using Hadoop

2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) • 2010
View 6 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…