• Publications
  • Influence
Finding Frequent Items in Data Streams
This work presents a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space, which achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies.
The LCA Problem Revisited
We present a very simple algorithm for the Least Common Ancestors problem. We thus dispel the frequently held notion that optimal LCA computation is unwieldy and unimplementable. Interestingly, this
NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees
A program called NOTUNG is described that facilitates large scale analysis, using both rooted and unrooted trees, and provides a basic building block for inferring duplication dates from gene trees automatically and can be used as an exploratory analysis tool for evaluating alternative hypotheses.
Cache-oblivious B-trees
We present dynamic search-tree data structures that perform well in the setting of a hierarchical memory (including various levels of cache, disk, etc.), but do not depend on the number of memory
Let sleeping files lie: pattern matching in Z-compressed files
This paper considers pattern matching without decompression in the UNIX Z-compression, a variant of the Lempel Ziv adaptive compression scheme, and shows how to modify the algorithms to achieve a trade-off between the amount of extra space used and the algorithm's time complexity.
On the approximability of numerical taxonomy (fitting distances by tree metrics)
This paper presents the first algorithm for this problem with a performance guarantee, and shows that it is ${cal NP}-hard to find a tree metric T such that $\parallel T-D\parallel{\infty}<\frac{9}{8}\varepsilon$.
Cache-oblivious streaming B-trees
A cache-aware version of the COLA, the <b><i>lookahead array</i></b>, which achieves the same bounds as Brodal and Fagerberg's (cache-aware) B<sup>ε</sup>-tree.
On the sorting-complexity of suffix tree construction
A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.