Selecting related terms in query-logs using two-stage SimRank


It is commonly believed that query logs from Web search are a gold mine for search business, because they reflect users' preference over Web pages presented by search engines, so a lot of studies based on query logs have been carried out in the last few years. In this study, we assume that two queries are relevant to each other when they have same clicked page in their result lists, and we also consider the queries' topics of user's need. Thus, we propose a Two-Stage SimRank (called TSS in this paper) algorithm based on SimRank and some clustering algorithms to compute the similarity among queries, and then use it to discover relevant terms for query expansion, considering the information of topics and the global relationships of queries concurrently, with a query log collected by a practical search engine. Experimental results on two TREC test collections show that our approach can discover qualified terms effectively and improve retrieval performance.

DOI: 10.1145/2063576.2063867

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Ma2011SelectingRT, title={Selecting related terms in query-logs using two-stage SimRank}, author={Yunlong Ma and Hongfei Lin and Yuan Lin}, booktitle={CIKM}, year={2011} }