• Publications
  • Influence
MapReduce Online
TLDR
We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Expand
  • 844
  • 61
  • PDF
LSH forest: self-tuning indexes for similarity search
TLDR
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Expand
  • 348
  • 40
  • PDF
ROFL: routing on flat labels
TLDR
In this paper we take an initial stab at this challenge, proposing and analyzing our ROFL routing algorithm. Expand
  • 263
  • 36
  • PDF
Implementing declarative overlays
TLDR
Overlay networks are used today in a variety of distributed systems ranging from file-sharing and storage systems to communication infrastructures. Expand
  • 366
  • 25
  • PDF
Declarative networking: language, execution and optimization
TLDR
Declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures. Expand
  • 303
  • 22
  • PDF
Titian: Data Provenance Support in Spark
TLDR
We introduce Titian, a library that enables data provenance—tracking data through transformations—in Apache Spark. Expand
  • 89
  • 16
  • PDF
Pregelix: Big(ger) Graph Analytics on a Dataflow Engine
TLDR
We present Pregelix, a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. Expand
  • 105
  • 16
  • PDF
Big Data Analytics with Datalog Queries on Spark
TLDR
We present BigDatalog, a full Datalog language implementation on Apache Spark developed under the Deductive Application Language System. Expand
  • 84
  • 15
  • PDF
Online aggregation for large MapReduce jobs
TLDR
In online aggregation, a database system processes a user's aggregation query in an online fashion. Expand
  • 181
  • 11
  • PDF
Declarative networking
TLDR
Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. Expand
  • 193
  • 9
  • PDF