A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

Abstract

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collection.

DOI: 10.1145/3130348.3130377

Extracted Key Phrases

8 Figures and Tables

Readings in Information Retrieval

  • Sparck Jones
  • 1997
Highly Influential
1 Excerpt
Showing 1-10 of 736 extracted citations
050100150'02'04'06'08'10'12'14'16
Citations per Year

1,539 Citations

Semantic Scholar estimates that this publication has received between 1,336 and 1,771 citations based on the available data.

See our FAQ for additional information.