Does Category A Anchor Text Improve Category B Results?


Associating anchor text with pages, to which links are pointing, is a well-known approach to improve retrieval quality. It was used in the first version of Google [Brin and Page 1998]. On one hand, using the anchor text alone allows one to obtain a system with decent performance [Anh and Moffat 2010; Hiemstra and Hauff 2010]. We also know that the anchor text is a strong relevance signal from our own experiments in TREC 2011 [Boytsov and Belova 2011]. On the other hand, the size of the anchor text is much smaller than size of the text for a full collection. Thus, enriching the Category B index (built over 50M documents) with the Category A anchor text index (built over 370M short documents), seemed to be an appealing method of improving performance at little cost.

