Learn More
The probability ranking principle (PRP) - ranking documents in response to a query by their relevance probabilities - is the theoretical foundation of most ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects, specifically, changes to documents that result from a(More)
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a <i>structural re-ranking</i> approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider <i>generation links</i>, which indicate that the language(More)
Predicting <i>query performance</i>, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval scores in the result list of the documents most highly ranked. We argue that for retrieval methods(More)
Most previous work on the recently developed <i>language-modeling</i> approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the(More)
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform reranking based on <i>centrality</i> within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing(More)
We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses language models induced from both documents and clusters. First, we treat the pseudo-feedback documents produced in response to the original query as a set of <i>pseudo-query</i> that themselves can serve as input to the retrieval process. Observing that the documents returned(More)
We present a suite of query expansion methods that are based on word embeddings. Using Word2Vec's CBOW embedding approach, applied over the entire corpus on which search is performed, we select terms that are semantically related to the query. Our methods either use the terms to expand the original query or integrate them with the effective(More)
To improve the precision at the very top ranks of a document list presented in response to a query, researchers suggested to exploit information induced from <i>clustering</i> of documents highly ranked by some initial search. We propose a novel model for ranking such (<i>query-specific</i>) clusters by the presumed percentage of relevant documents that(More)