A tale of two greps

  title={A tale of two greps},
  author={A. Hume},
  journal={Software: Practice and Experience},
  • A. Hume
  • Published 1988
  • Computer Science
  • Software: Practice and Experience
Text searching programs such as the UNIX system tools grep and egrep require more than just good algorithms; they need to make efficient use of system resources such as I/O. I describe improving the I/O management in grep and egrep by using a new fast I/O library fio to replace the normal I/O library stdio. I also describe incorporating the Boyer‐Moore algorithm into egrep; egrep is now typically 8–10 (for some common patterns 30–40) times faster than grep. 
SEFT: a search engine for text
While not as fast as grep‐style tools, seft provides a valuable facility for impromptu personal information retrieval tasks and combines the freedom of natural language queries with the benefits of a ranked answer list and easy inspection of retrieval results. Expand
Needles and Haystacks: a search engine for personal information collections
  • Owen de Kretser, A. Moffat
  • Computer Science
  • Proceedings 23rd Australasian Computer Science Conference. ACSC 2000 (Cat. No.PR00518)
  • 2000
This paper describes a hybrid approach that offers the ranked queries and similarity matching of a genuine information retrieval system, but does so without any need for an index to be precomputed. Expand
SFIO: Safe/Fast String/File IO
Sfio is a new input/output library that can be used as a replacement for Stdio, the C language standard I/O library, and standard utilities can gain substantial performance improvement when based completely on Sfio. Expand
Fast string searching
Two algorithms are described that perform 47% fewer comparisons and are about 4.5 times faster across a wide range of architectures and compilers. Expand
Extending Unix Pipelines to DAGs
Dgsh was evaluated through a number of common data processing and domain-specific examples, and was found to offer an expressive way to specify processing topologies, while also generally increasing processing throughput. Expand
Clgrep: A Parallel String Matching Tool
The results suggest that the performance of Heterogeneous Parallel Computing matching, either on multi-core CPU or GPU, is highly related to the computational intensity of certain cases. Expand
Lackwit: A Program Understanding Tool Based on Type Inference
  • R. O'Callahan, D. Jackson
  • Computer Science
  • Proceedings of the (19th) International Conference on Software Engineering
  • 1997
A prototype tool is used to answer a user’s questions about a 17,000 line program written in C, and representation sharing with type inference is computed, using types to encode representations. Expand
Algorithms for Finding Patterns in Strings
  • A. Aho
  • Computer Science
  • Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity
  • 1990
This chapter discusses the algorithms for solving string-matching problems that have proven useful for text-editing and text-processing applications and several innovative, theoretically interesting algorithms have been devised that run significantly faster than the obvious brute-force method. Expand
On Extended Regular Expressions
An improved pumping lemma is provided that will show that a larger class of languages is not recognizable by extended regular expressions, and some questions regarding extended multi-pattern languages introduced by Nagy are investigated. Expand
Practical Program Understanding with Type Inference.
A method for computing representation sharing by using types to encode representations and uses polymorphic type inference to compute new types for all variables, eliminating cases of incidental type sharing where the variables might have different representations. Expand


Practical fast searching in strings
It is discovered that a method developed by Boyer and Moore can outperform even special‐purpose search instructions that may be built into the computer hardware for very short substrings. Expand
The UNIX system: Cheap dynamic instruction counting
  • P. Weinberger
  • Computer Science
  • AT&T Bell Laboratories Technical Journal
  • 1984
An easy implementation of count profiling is described, and it has been implemented on the Motorola 68000, VAX™, and AT&T 3B20 computers. Expand
A fast string searching algorithm
The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected. Expand