Learn More
SUMMARY A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition to previous applications that required such search,(More)
Errors The string-matching problem is a very common problem. We are searching for a string P = PtP2.. "Pro inside a large text file T = tlt2...t., both sequences of characters from a finite character set Z. The characters may be English characters in a text file, DNA base pairs, lines of source code, angles between edges in polygons, machines or machine(More)
GLIMPSE, which stands for GLobal IMPlicit SEarch, provides indexing and query schemes for file systems. The novelty of glimpse is that it uses a very small index — in most cases 2-4% of the size of the text — and still allows very flexible full-text retrieval including Boolean queries, approximate matching (i.e., allowing misspelling), and even searching(More)
We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time(More)
It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. In this paper we introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicat-ing, and accessing Internet information.
We present a new le system that combines name-based and content-based a c c ess to les at the same time. Our design allows both methods to be used at any time, thus preserving the beneets of both. Users can create their own name spaces based on queries, on explicit path names, or on any combination interleaved arbitrarily. All regular le operations such as(More)
Searching for a pattern in a text file is a very common operation in many applications ranging from text editors and databases to applications in molecular biology. In many instances the pattern does not appear in the text exactly. Errors in the text or in the query can result from misspelling or from experimental errors (e.g., when the text is a DNA(More)
We present a new data structure, called the xed-queries tree, for the problem of nding all elements of a xed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisses the triangle inequality. We give an analysis of several performance(More)