• Publications
  • Influence
Novelty and diversity in information retrieval evaluation
This paper develops a framework for evaluation that systematically rewards novelty and diversity into a specific evaluation measure, based on cumulative gain, and demonstrates the feasibility of this approach using a test collection based on the TREC question answering track.
Reciprocal rank fusion outperforms condorcet and individual rank learning methods
Reciprocal Rank Fusion is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method.
Information Retrieval - Implementing and Evaluating Search Engines
Information Retrieval offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation, and is a valuable reference for professionals in computer science, computer engineering, and software engineering.
Evaluation of machine-learning protocols for technology-assisted review in electronic discovery
Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in
Efficient and effective spam filtering and re-ranking for large web datasets
It is shown that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the ClueWeb09 dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision as well as rank measures of nearly all submitted runs.
Data Compression Using Dynamic Markov Modelling
Experimental results reported here indicate that the Markov modelling approach generally achieves much better data compression than that observed with competing methods on typical computer data.
TREC 2006 Spam Track Overview
TREC’s Spam Track uses a standard testing framework that presents a set of chronologically ordered email messages a spam filter for classification and four different forms of user feedback are modeled, intended to model a user reading email from time to time and perhaps not diligently reporting the filter's errors.
Efficient construction of large test collections
This work proposes two methods, Intemctive Searching and Judging and Moveto-front Pooling, that yield effective test collections while requiring many fewer judgements.
Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review
We enhance the autonomy of the continuous active learning method shown by Cormack and Grossman (SIGIR 2014) to be effective for technology-assisted review, in which documents from a collection are
Email Spam Filtering: A Systematic Review
  • G. Cormack
  • Computer Science
    Found. Trends Inf. Retr.
  • 23 June 2008
This work examines the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe, and outlines several uncertainties and proposes experimental methods to address them.