Learn More
Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper,(More)
Statistical measures of word similarity have application in many areas of natural language processing , such as language modeling and information retrieval. We report a comparative study of two methods for estimating word co-occurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web(More)
The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam — pages designed to deceive search engines so as to deliver an unwanted pay-load. We examine the effect of spam(More)
Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote <italic>The Wind in the Willows</italic>?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem reduces to finding potential answers in large volumes of data and validating their accuracy.(More)
Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that(More)
We propose an integration of term proximity scoring into Okapi BM25. The relative retrieval effectiveness of our retrieval method, compared to pure BM25, varies from collection to collection.We present an experimental evaluation of our method and show that the gains achieved over BM25 as the size of the underlying text collection increases. We also show(More)
A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup.(More)