OpenMatch: An Open Source Library for Neu-IR Research

  title={OpenMatch: An Open Source Library for Neu-IR Research},
  author={Zhenghao Liu and Kaitao Zhang and Chenyan Xiong and Zhiyuan Liu and Maosong Sun},
  journal={Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  • Zhenghao Liu, Kaitao Zhang, Maosong Sun
  • Published 30 January 2021
  • Computer Science
  • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
OpenMatch is a Python-based library that serves for Neural Information Retrieval (Neu-IR) research. It provides self-contained neural and traditional IR modules, making it easy to build customized and higher-capacity IR systems. In order to develop the advantages of Neu-IR models for users, OpenMatch provides implementations of recent neural IR models, complicated experiment instructions, and advanced few-shot training methods. OpenMatch reproduces corresponding ranking results of previous work… 

Figures and Tables from this paper

Self-supervised Fine-tuning for Efficient Passage Re-ranking
This work proposes a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models that improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation.
Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision
Experiments on three TREC benchmarks in the web, news, and biomedical domains show that MetaAdaptRank significantly improves the few-shot ranking accuracy of Neu-IR models, and analyses indicate that the method thrives from both its contrastive weak data synthesis and meta-reweighted data selection.
SCAI-QReCC Shared Task on Conversational Question Answering
SCAI’21 was organised as an independent on-line event and featured a shared task on conversational question answering that identified evaluation of answer correctness in this settings as the major challenge and a current research gap.
Axiomatic Retrieval Experimentation with ir_axioms
Axiomatic approaches to information retrieval have played a key role in determining constraints that characterize good retrieval models. Beyond their importance in retrieval theory, axioms have been
Bias-aware Fair Neural Ranking for Addressing Stereotypical Gender Biases
A bias-aware fair ranker that explicitly incorporates a notion of gender bias and hence controls how bias is expressed in documents that are retrieved and reduces bias while maintaining retrieval effectiveness over different query sets is proposed.
Shallow pooling for sparse labels
Crowdsourced workers made preference judgments between the top item returned by a modern neural ranking stack and a judged relevant item for the nearly seven thousand queries in the passage ranking development set, supporting concerns that current MS MARCO datasets may no longer be able to recognize genuine improvements in rankers.


Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
An overview of toolkit features and empirical results that illustrate its effectiveness on two popular ranking tasks are presented and how the group has built a culture of replicability through shared norms and tools that enable rigorous automated testing is described.
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.
MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching
A novel system, namely MatchZoo, to facilitate the learning, practicing and designing of neural text matching models and can help researchers to train, test and apply state-of-the-art models systematically and to develop their own models with rich APIs and assistance.
Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search
Conv-KNRM uses Convolutional Neural Networks to represent n-grams of various lengths and soft matches them in a unified embedding space and is utilized by the kernel pooling and learning-to-rank layers to generate the final ranking score.
Reading Wikipedia to Answer Open-Domain Questions
This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts.
Anserini: Enabling the Use of Lucene for Information Retrieval Research
Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks, and aims to provide the best of both worlds to better align information retrieval practice and research.
Deeper Text Understanding for IR with Contextual Neural Language Modeling
Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings in bringing large improvements on queries written in natural languages.
Selective Weak Supervision for Neural Information Retrieval
The classic IR intuition that anchor-document relations approximate query-document relevance is revisited and a reinforcement weak supervision selection method, ReInfoSelect, which learns to select anchor- document pairs that best weakly supervise the neural ranker (action), using the ranking performance on a handful of relevance labels as the reward.
CEDR: Contextualized Embeddings for Document Ranking
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.