Learn More
Predicting <i>query performance</i>, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval scores in the result list of the documents most highly ranked. We argue that for retrieval methods(More)
The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses about the characteristics of effective retrieval. We present(More)
How can a search engine with a relatively weak relevance ranking function compete with a search engine that has a much stronger ranking function? This dual challenge, which to the best of our knowledge has not been addressed in previous work, entails an interesting bi-modal utility function for the weak search engine. That is, the goal is to produce in(More)
We show that two tasks which were independently addressed in the information retrieval literature actually amount to the exact same task. The first is query performance prediction; i.e., estimating the effectiveness of a search performed in response to a query in the absence of relevance judgments. The second task is cluster ranking, that is, ranking(More)
Focused retrieval (a.k.a., passage retrieval) is important at its own right and as an intermediate step in question answering systems. We present a new Web-based collection for focused retrieval. The document corpus is the Category A of the ClueWeb12 collection. Forty-nine queries from the educational domain were created. The $100$ documents most highly(More)
We present a study of the <i>cluster hypothesis</i>, and of the performance of cluster-based retrieval methods, performed over large scale Web collections. Among the findings we present are (i) the cluster hypothesis can hold, as determined by a specific test, for large scale Web corpora to the same extent it does for newswire corpora; (ii) while spam(More)