A probabilistic justification for using tf×idf term weighting in information retrieval

  title={A probabilistic justification for using tf×idf term weighting in information retrieval},
  author={Djoerd Hiemstra},
  journal={International Journal on Digital Libraries},
This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well-known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for… CONTINUE READING
Highly Influential
This paper has highly influenced 13 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 176 citations. REVIEW CITATIONS
100 Citations
19 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 100 extracted citations

177 Citations

Citations per Year
Semantic Scholar estimates that this publication has 177 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 19 references

eds.): Introduction to the Theory of Statistics

  • A. M. Mood, F. A. Graybill
  • Second edition. McGraw-Hill,
  • 1963
Highly Influential
3 Excerpts

F.M.G.: Cross-language retrieval in Twenty-One: using one, some or all possible translations

  • D. Hiemstra, de Jong
  • Proc. 14th Twente Workshop on Language Technology…
  • 1998
1 Excerpt

Similar Papers

Loading similar papers…