• Publications
  • Influence
An improved data stream summary: the count-min sketch and its applications
We introduce a new sublinear space data structure—the Count-Min Sketch— for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and
Data streams: algorithms and applications
TLDR
Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Approximate String Joins in a Database (Almost) for Free
TLDR
This paper develops a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them, and demonstrates experimentally the benefits of the technique over the direct use of UDFs.
Influence sets based on reverse nearest neighbor queries
TLDR
This paper formalizes a novel notion of influence based on reverse neighbor queries and its variants, and presents a general approach for solving RNN queries and an efficient R-tree based method for large data sets, based on this approach.
Relative-Error CUR Matrix Decompositions
TLDR
These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.
Optimal Histograms with Quality Guarantees
TLDR
Algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics and an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates are presented.
Efficient algorithms for document retrieval problems
TLDR
This paper considers document retrieval problems that are motivated by online query processing in databases, Information Retrieval systems and Computational Biology, and provides the first known optimal algorithm for the document listing problem.
Faster least squares approximation
TLDR
This work presents two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms.
Online Stochastic Matching: Beating 1-1/e
TLDR
A novel application of the idea of the power of two choices from load balancing, which compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order to characterize an upper bound for the optimum in any scenario.
Sponsored Search Auctions with Markovian Users
TLDR
A Markovian user model is studied that retains the core bidding dynamics of the GSP auction that make it useful for advertisers, and shows that the optimal assignment can be found efficiently (even in near-linear time).
...
...