• Publications
  • Influence
Pig latin: a not-so-foreign language for data processing
TLDR
A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use. Expand
Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience
TLDR
Pig is a high-level dataflow system that aims at a sweet spot between SQL and Map-Reduce, and performance comparisons between Pig execution and raw Map- Reduce execution are reported. Expand
Distributed top-k monitoring
TLDR
This work shows that transmitting entire data streams is unnecessary to support top-k monitoring queries and presents an alternative approach that reduces communication significantly, and empirically through extensive simulation on real-world data that this approach reduces overall communication cost by an order of magnitude. Expand
Adaptive filters for continuous queries over distributed data streams
TLDR
This work considers an environment where distributed data sources continuously stream updates to a centralized processor that monitors continuous queries over the distributed data, and proposes a new technique for reducing the overhead. Expand
Query Processing, Approximation, and Resource Management in a Data Stream Management System
This paper describes our ongoing work developing the Stanford Stream Data Manager (STREAM), a system for executing continuous queries over multiple continuous data streams. The STREAM system supportsExpand
TensorFlow-Serving: Flexible, High-Performance ML Serving
TLDR
TensorFlow-Serving is described, a system to serve machine learning models inside Google which is also available in the cloud and via open-source, and ways to integrate with systems that convey new models and updated versions from training to serving. Expand
What's new on the web?: the evolution of the web from a search engine perspective
TLDR
The authors' findings indicate a rapid turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them, which is likely to remain consistent over time. Expand
Adaptive precision setting for cached approximate values
TLDR
A parameterized algorithm for adjusting the precision of cached approximations adaptively to achieve the best performance as data values, precision requirements, or workload vary, which easily outperforms previous algorithms for exact caching. Expand
Query Processing, Resource Management, and Approximation ina Data Stream Management System
This paper describes our ongoing work developing the Stanford Stream Data Manager (STREAM), a system for executing continuous queries over multiple continuous data streams. The STREAM system supportsExpand
Finding (recently) frequent items in distributed data streams
TLDR
This work considers the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams, and introduces the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. Expand
...
1
2
3
4
5
...