Index Combinations and Query Reformulations for Mixed Monolingual Web Retrieval

Abstract

We examine the effectiveness on the multilingual WebCLEF 2006 test set of light-weight methods that have proved successful in other web retrieval settings: combinations of document representations on the one hand and query reformulation techniques on the other. We investigate a range of approaches to crosslingual web retrieval using the test suite of the mixed monolingual CLEF 2006 WebCLEF track, featuring a stream of known-item topics in various languages. The topics are a mixture of manual (human generated) and automatically generated topics. We examine the robustness of well-known web retrieval techniques on this test set: compact document representations (titles or incoming anchor-texts), and query reformulation techniques. In Section 1 we describe our retrieval system as well as the approaches we applied. In Section 2 we describe our experiments, while the results are detailed in Section 3. We conclude in Section 4. For details on the WebCLEF collection and on the topics used we refer to [1]. 1 System Description Our retrieval system is based on the Lucene engine [5]. For ranking, we used the default similarity measure of Lucene, i.e., for a collection D, document d and query q containing terms ti:

DOI: 10.1007/978-3-540-74999-8_104

Extracted Key Phrases

1 Figure or Table

Cite this paper

@inproceedings{Balog2006IndexCA, title={Index Combinations and Query Reformulations for Mixed Monolingual Web Retrieval}, author={Krisztian Balog and Maarten de Rijke}, booktitle={CLEF}, year={2006} }