Shark: SQL and rich analytics at scale

  title={Shark: SQL and rich analytics at scale},
  author={Reynold Xin and Josh Rosen and Matei Zaharia and Michael J. Franklin and Scott Shenker and Ion Stoica},
  booktitle={SIGMOD Conference},
Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g. iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100X faster than Apache Hive, and machine learning programs more than 100X faster than Hadoop. Unlike previous… CONTINUE READING
Highly Influential
This paper has highly influenced 40 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 407 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 27 times over the past 90 days. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 251 extracted citations

407 Citations

Citations per Year
Semantic Scholar estimates that this publication has 407 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-8 of 8 references

A comparison of approaches to large-scale data analysis

  • A. Pavlo
  • In SIGMOD,
  • 2009
Highly Influential
17 Excerpts

Similar Papers

Loading similar papers…