Simplified Data Wrangling with ir_datasets

  title={Simplified Data Wrangling with ir\_datasets},
  author={Sean MacAvaney and Andrew Yates and Sergey Feldman and Doug Downey and Arman Cohan and Nazli Goharian},
  journal={Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  • Sean MacAvaney, Andrew Yates, Nazli Goharian
  • Published 3 March 2021
  • Computer Science
  • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even basic formats can have subtle dataset-specific nuances that need to be considered for proper use. To help mitigate these challenges, we introduce a new robust and lightweight tool (ir_datasets) for acquiring, managing, and performing typical operations over… 

Figures and Tables from this paper

Streamlining Evaluation with ir-measures
We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations,
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding
We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04). Our model zoo includes two specialized
How Train-Test Leakage Affects Zero-shot Retrieval
This paper investigates the impact of this unintended train–test leakage by training neural models on MS MARCO document ranking data with different proportions of controlled leakage to Robust04 and the TREC 2017 and 2018 Common Core tracks as test datasets.
Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models
This work studies how three different popular approaches to handling documents for IR datasets behave and how they scale with multiple GPUs, and shows how popular techniques for improving loading times, like memory pining, multiple workers, and RAMDISK usage, can reduce the training time further with minor memory overhead.
CODEC: Complex Document and Entity Collection
Overall, CODEC provides challenging research topics to support the development and evaluation of entity-centric search methods and shows significant gains in document ranking, demonstrating the resource’s value for evaluating and improving entity-oriented search.
On Survivorship Bias in MS MARCO
Survivorship bias is the tendency to concentrate on the positive outcomes of a selection process and overlook the results that gener-ate negative outcomes. We observe that this bias could be present
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
This work uses comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task, and shows that this approach yields improvements in retrieval effectiveness.
Axiomatic Retrieval Experimentation with ir_axioms
Axiomatic approaches to information retrieval have played a key role in determining constraints that characterize good retrieval models. Beyond their importance in retrieval theory, axioms have been
The Power of Anchor Text in the Neural Retrieval Era
This reproducibility study analyzes to what extent anchor texts still are particularly helpful for navigational queries, but also that they now yield less homogeneous results than the content of documents.
Webis at TREC 2021: Deep Learning, Health Misinformation, and Podcasts Tracks
We describe the Webis group’s participation in the TREC 2021 Deep Learning, Health Misinformation, and Podcasts tracks. Our three LambdaMART-based runs submitted to the Deep Learning track focus on


Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
An overview of toolkit features and empirical results that illustrate its effectiveness on two popular ranking tasks are presented and how the group has built a culture of replicability through shared norms and tools that enable rigorous automated testing is described.
TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection
TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. One of the
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.
Declarative Experimentation in Information Retrieval using PyTerrier
This work proposes a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design in information retrieval (IR), and targets IR platforms as backends in order to execute and evaluate retrieval pipelines.
Anserini: Enabling the Use of Lucene for Information Retrieval Research
Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks, and aims to provide the best of both worlds to better align information retrieval practice and research.
Terrier Information Retrieval Platform
Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier
Flexible IR Pipelines with Capreolus
The Capreolus toolkit is rewritten to take this approach to implementing experimental pipelines as dependency graphs of functional "IR primitives,'' which the authors call modules, that can be used and combined as needed.
DiffIR: Exploring Differences in Ranking Models' Behavior
DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually 'diffing' system rankings at the individual result level for queries where behavior significantly diverges.
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset
DL-HARD contains fifty topics from the official DL 2019/2020 evaluation benchmark, half of which are newly and independently assessed and a framework for identifying challenging queries is introduced.