Overview of the trec 2011 web track
- C.L.A. Clarke, N. Craswell, I. Soboro, E. M. Voorhees
- TREC-20: Proceedings of the Nineteenth Text…
Associating anchor text with pages, to which links are pointing, is a well-known approach to improve retrieval quality. It was used in the first version of Google [Brin and Page 1998]. On one hand, using the anchor text alone allows one to obtain a system with decent performance [Anh and Moffat 2010; Hiemstra and Hauff 2010]. We also know that the anchor text is a strong relevance signal from our own experiments in TREC 2011 [Boytsov and Belova 2011]. On the other hand, the size of the anchor text is much smaller than size of the text for a full collection. Thus, enriching the Category B index (built over 50M documents) with the Category A anchor text index (built over 370M short documents), seemed to be an appealing method of improving performance at little cost.