• Publications
  • Influence
The input/output complexity of sorting and related problems
Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition. Expand
Random sampling with a reservoir
Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin, and an efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use. Expand
High-order entropy-compressed text indexes
We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of <i>n</i> symbols over an alphabet σ,Expand
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
The result presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice. Expand
Approximate computation of multidimensional aggregates of sparse data using wavelets
A novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner based upon a multiresolution wavelet decomposition is presented. Expand
External-memory graph algorithms
We present a collection of new techniques for designing and analyzing e cient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety ofExpand
Optimal External Memory Interval Management
The external interval tree is presented, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals that uses a weight-balancing technique for efficient worst-case manipulation of balanced trees. Expand
Wavelet-based histograms for selectivity estimation
This paper presents a technique based upon a multiresolution wavelet decomposition for building histograms on the underlying data distributions, with applications to databases, statistics, and simulation. Expand
External memory algorithms and data structures: dealing with massive data
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed. Expand
Approximations with Minimum Packing Constraint Violation
We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the correspondingExpand