Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents

Abstract

In this paper, we propose a method to improve the precision of top retrieved documents in Chinese information retrieval where the query is a short description by re-ordering retrieved documents in the initial retrieval. To reorder the documents, we firstly find out terms in query and their importance scales by making use of the information derived from top N (N<=30) retrieved documents in the initial retrieval; secondly, we re-order retrieved K (N<<K) documents by what kinds of terms of query they contain. That is, we first automatically extract key terms from top N retrieved documents, then we collect key terms that occur in query and their document frequencies in the N retrieved documents, finally we use these collected terms to re-order the initially retrieved documents. Each collected term is assigned a weight by its length and its document frequency in top N retrieved documents. Each document is re-ranked by the sum of weights of collected terms it contains. In our experiments on 42 query topics in NTCIR3 Cross Lingual Information Retrieval (CLIR) dataset, an average 17.8%-27.5% improvement can be made for top 10 documents and an average 6.6%-26.9% improvement can be made for top 100 documents at relax/rigid relevance judgment and different parameter setting.

DOI: 10.1007/978-3-540-31865-1_13

Extracted Key Phrases

19 Figures and Tables

Cite this paper

@inproceedings{Yang2005ImprovingRE, title={Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents}, author={Lingpeng Yang and Dong-Hong Ji and Guodong Zhou and Nie Yu}, booktitle={ECIR}, year={2005} }