When documents are very long, BM25 fails!

Abstract

We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which "shifts" the term frequency normalization formula to boost scores of very long documents. Our experiments show that BM25L, with the same computation cost, is more… (More)
DOI: 10.1145/2009916.2010070

3 Figures and Tables

Topics

Statistics

010202009201020112012201320142015201620172018
Citations per Year

63 Citations

Semantic Scholar estimates that this publication has 63 citations based on the available data.

See our FAQ for additional information.

  • Presentations referencing similar topics