Learn More
Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper,(More)
Statistical measures of word similarity have application in many areas of natural language processing , such as language modeling and information retrieval. We report a comparative study of two methods for estimating word co-occurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web(More)
Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote <italic>The Wind in the Willows</italic>?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem reduces to finding potential answers in large volumes of data and validating their accuracy.(More)
The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam — pages designed to deceive search engines so as to deliver an unwanted pay-load. We examine the effect of spam(More)
We propose an integration of term proximity scoring into Okapi BM25. The relative retrieval effectiveness of our retrieval method, compared to pure BM25, varies from collection to collection.We present an experimental evaluation of our method and show that the gains achieved over BM25 as the size of the underlying text collection increases. We also show(More)
Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that(More)
Test collections with a million or more documents are needed for the evaluation of modern information retrieval systems. Yet their construction requires a great deal of effort. Judgements must be rendered as to whether or not documents are relevant to each of a set of queries. Exhaustive judging, in which every document is examined and a judgement rendered,(More)
We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating several potentially relevant documents. Traditional ranking methods for information retrieval, based on(More)