A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

Abstract

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collection.

DOI: 10.1145/3130348.3130377

Extracted Key Phrases

8 Figures and Tables

050100150'02'04'06'08'10'12'14'16
Citations per Year

1,677 Citations

Semantic Scholar estimates that this publication has 1,677 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Zhai2001ASO, title={A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval}, author={ChengXiang Zhai and John D. Lafferty}, booktitle={SIGIR Forum}, year={2001} }