• Corpus ID: 14815475

Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review

  title={Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review},
  author={Gordon V. Cormack and Maura R. Grossman},
We enhance the autonomy of the continuous active learning method shown by Cormack and Grossman (SIGIR 2014) to be effective for technology-assisted review, in which documents from a collection are retrieved and reviewed, using relevance feedback, until substantially all of the relevant documents have been reviewed. [] Key Method Autonomy is enhanced through the elimination of topic-specific and dataset-specific tuning parameters, so that the sole input required by the user is, at the outset, a short query…

Figures and Tables from this paper

Refresh Strategies in Continuous Active Learning
The effects of the default and alternative refresh strategies on the effectiveness and efficiency of CAL are investigated and it is found that more frequent refreshes can significantly reduce the human effort required to achieve certain recall.
Increasing the Efficiency of High-Recall Information Retrieval
It is hypothesize that total assessment effort to achieve high recall can be reduced by using shorter document excerpts in place of full documents for the assessment of relevance and using a high-recall retrieval system based on continuous active learning (CAL).
UWaterlooMDS at the TREC 2017 Common Core Track
A High-Recall retrieval system for assessors to evaluate documents within a limited period of time based on paragraph/document level relevance feedback and includes a search engine where users can repeatedly enter their own queries to relevant documents.
UvA-DARE (Digital Academic Repository) When to Stop Reviewing in Technology-Assisted Reviews Sampling from an Adaptive Distribution to Estimate Residual Relevant
  • Li
  • Computer Science
  • 2020
This article handles the problem of deciding the stopping point of TAR under the continuous active learning framework by jointly training a ranking model to rank documents, and by conducting a “greedy” sampling to estimate the total number of relevant documents in the collection.
When to Stop Reviewing in Technology-Assisted Reviews
This article handles the problem of deciding the stopping point of TAR under the continuous active learning framework by jointly training a ranking model to rank documents, and by conducting a “greedy” sampling to estimate the total number of relevant documents in the collection.
Evaluating sentence-level relevance feedback for high-recall information retrieval
Simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art baseline model implementation (BMI) of the AutoTAR continuous active learning (“CAL”) method employed in the TREC 2015 and 2016 Total Recall Track.
Effective User Interaction for High-Recall Retrieval: Less is More
The results suggest that for high-recall systems to maximize performance, system designers should think carefully about the amount and nature of user interaction incorporated into the system.
How to Read Less: On the Benefit of Active Learning for Primary Study Selection in Systematic Literature Reviews
FASTREAD is discovered, which is a new state-of-the-art in active learning for SE SLRs, and shows that FASTREAD can save researchers much time during the literature review process while sacrificing very little in the final recall.
MRG_UWaterloo and WaterlooCormack Participation in the TREC 2017 Common Core Track
The MRG_UWaterloo group from the University of Waterloo used a Continuous Active Learning approach to identify and manually review a substantial fraction of the relevant documents for each of the 250 Common Core topics to create a set of relevance assessments (“qrels”) comparable to the official Common Core Track qrels.
A System for Efficient High-Recall Retrieval
The design of the system that affords efficient high-recall retrieval is presented, which uses a state-of-the-art implementation of continuous active learning (CAL), and is designed to allow other feedback systems to be attached with little work.


Evaluation of machine-learning protocols for technology-assisted review in electronic discovery
Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in
Machine Learning for Information Retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks
For the TREC 2009, this approach was used exclusively for the adhoc web, diversity and relevance feedback tasks, as well as to the batch legal task: the ClueWeb09 and Tobacco collections were processed end-to-end and never indexed.
RCV1: A New Benchmark Collection for Text Categorization Research
This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.
Practical learning from one-sided feedback
Experimental results show that two active learning methods which reduce the number of labels requested in practice can be significantly more effective in practice than those using the Apple Tasting transformation, even on minority class problems.
Active Learning Literature Survey
This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Overview of the TREC 2011 Legal Track
The TREC 2011 Legal Track consisted of a single task: the learning task, which captured elements of both the TREC 2010 learning and interactive tasks. Participants were required to rank the entire
H5 at TREC 2008 Legal Interactive: User Modeling, Assessment & Measurement
It is demonstrated how User Modeling, Document Assessment and Measurement combine to provide a shared understanding of relevance, a means for representing that understanding to an automated system and a mechanism for iterating and correcting such a system so as to converge on a desired result.
Comments on “ The Implications of Rule 26 ( g ) on the Use of Technology-Assisted Review ”
Approaches to technology-assisted review (“TAR”) and its validation—presented as “obligations” under Federal Rule 26(g) in a recent article by Karl Schieneman and Thomas C. Gricks III—could, if
Building a filtering test collection for TREC 2002
This work constructed an entirely new set of search topics for the Reuters Corpus for measuring filtering systems, and found that systems performed very differently on the category topics than on the assessor-built topics.
Efficient construction of large test collections
This work proposes two methods, Intemctive Searching and Judging and Moveto-front Pooling, that yield effective test collections while requiring many fewer judgements.