Searchable words on the Web

  title={Searchable words on the Web},
  author={Hugh E. Williams and Justin Zobel},
  journal={International Journal on Digital Libraries},
In designing data structures for text databases, it is valuable to know how many different words are likely to be encountered in a particular collection. For example, vocabulary accumulation is central to index construction for text database systems; it is useful to be able to estimate the space requirements and performance characteristics of the main-memory data structures used for this task. However, it is not clear how many distinct words will be found in a text collection or whether new… CONTINUE READING
Highly Cited
This paper has 191 citations. REVIEW CITATIONS
26 Citations
17 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 26 extracted citations

191 Citations

Citations per Year
Semantic Scholar estimates that this publication has 191 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 17 references

Automatic dictionary construction from large collections of text

  • J. Hasan
  • Master’s thesis, School of Computer Science and…
  • 2001
1 Excerpt

Combined models for high-performance compression of large text collections

  • J. Zobel, H. E. Williams
  • String Processing and Information Retrieval…
  • 1999
1 Excerpt

Comments on Zipf’s law and the structures and evolution of natural language

  • W. Li
  • Complexity, 3(5):9–10,
  • 1998
1 Excerpt

Similar Papers

Loading similar papers…