Simplified Data Wrangling with ir_datasets

  title={Simplified Data Wrangling with ir\_datasets},
  author={Sean MacAvaney and Andrew Yates and Sergey Feldman and Doug Downey and Arman Cohan and Nazli Goharian},
  journal={Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  • Sean MacAvaney, Andrew Yates, Nazli Goharian
  • Published 3 March 2021
  • Computer Science
  • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even basic formats can have subtle dataset-specific nuances that need to be considered for proper use. To help mitigate these challenges, we introduce a new robust and lightweight tool (ir_datasets) for acquiring, managing, and performing typical operations over… 

Figures and Tables from this paper

Streamlining Evaluation with ir-measures
We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations,
ABNIRML: Analyzing the Behavior of Neural IR Models
A new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML) is presented, which includes new types of diagnostic probes that allow us to test several characteristics—such as writing styles, factuality, sensitivity to paraphrasing and word order—that are not addressed by previous techniques.
Axiomatic Retrieval Experimentation with ir_axioms
Axiomatic approaches to information retrieval have played a key role in determining constraints that characterize good retrieval models. Beyond their importance in retrieval theory, axioms have been
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
This work uses comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task, and shows that this approach yields improvements in retrieval effectiveness.
CODEC: Complex Document and Entity Collection
Overall, CODEC provides challenging research topics to support the development and evaluation of entity-centric search methods and shows significant gains in document ranking, demonstrating the resource’s value for evaluating and improving entity-oriented search.
Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators
The experimental results for two different IR tasks reveal that retrieval pipelines are not robust to query variations that maintain the content the same, with effectiveness drops of ∼20% on average when compared with the original query as provided in the datasets.
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
ColBERTer is proposed, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction that fuses single-vector retrieval, multivector refinement, and optional lexical matching components into one model and achieves index storage parity with the plaintext size, with very strong effectiveness results.
Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models
This work studies how three different popular approaches to handling documents for IR datasets behave and how they scale with multiple GPUs, and shows how popular techniques for improving loading times, like memory pining, multiple workers, and RAMDISK usage, can reduce the training time further with minor memory overhead.
On Survivorship Bias in MS MARCO
Survivorship bias is the tendency to concentrate on the positive outcomes of a selection process and overlook the results that gener-ate negative outcomes. We observe that this bias could be present
Reproducing Personalised Session Search over the AOL Query Log
It is demonstrated that this new version of the AOL corpus has a far higher coverage of documents present in the original log than the 2017 version, and including the URL substantially improves performance across a variety of models.


Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
An overview of toolkit features and empirical results that illustrate its effectiveness on two popular ranking tasks are presented and how the group has built a culture of replicability through shared norms and tools that enable rigorous automated testing is described.
TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection
TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. One of the
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.
Declarative Experimentation in Information Retrieval using PyTerrier
This work proposes a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design in information retrieval (IR), and targets IR platforms as backends in order to execute and evaluate retrieval pipelines.
Anserini: Enabling the Use of Lucene for Information Retrieval Research
Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks, and aims to provide the best of both worlds to better align information retrieval practice and research.
Terrier Information Retrieval Platform
Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.
Flexible IR Pipelines with Capreolus
The Capreolus toolkit is rewritten to take this approach to implementing experimental pipelines as dependency graphs of functional "IR primitives,'' which the authors call modules, that can be used and combined as needed.
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
This work extensively analyzes different retrieval models and provides several suggestions that it believes may be useful for future work, finding that performing well consistently across all datasets is challenging.
DiffIR: Exploring Differences in Ranking Models' Behavior
DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually 'diffing' system rankings at the individual result level for queries where behavior significantly diverges.