Establishing Strong Baselines for TripClick Health Retrieval

  title={Establishing Strong Baselines for TripClick Health Retrieval},
  author={Sebastian Hofst{\"a}tter and Sophia Althammer and Mete Sertkan and Allan Hanbury},
We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the – originally too noisy – training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domainspecific pre-trained models on TripClick. Finally, we show that dense retrieval… 

TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

This paper presents the novel, relevance judgement test collection TripJudge for TripClick health retrieval and finds that that click and judgement-based evaluation can lead to substantially different system rankings.

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

This work presents a framework for improving the performance of a wide class of retrieval models at minimal computational cost and shows that this approach leads to substantial improvement in retrieval performance over scoring candidate documents in isolation from one another, as in a pair-wise training setting.



BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

This work extensively analyzes different retrieval models and provides several suggestions that it believes may be useful for future work, finding that performing well consistently across all datasets is challenging.

Passage Re-ranking with BERT

A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.

CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

This paper presents a search system to alleviate the special domain adaption problem, which utilizes the domain-adaptive pretraining and few-shot learning technologies to help neural rankers mitigate the domain discrepancy and label scarcity problems.

Deep Relevance Ranking Using Enhanced Document-Query Interactions

Several new models for document relevance ranking are explored, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016), and inspired by PACRR’s convolutional n-gram matching features, but extended in several ways including multiple views of query and document inputs.

CEDR: Contextualized Embeddings for Document Ranking

This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

CovidQA is presented, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's CO VID-19 Open Research Dataset Challenge, the first publicly available resource of its type.

End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training

This work combines neural IR and MRC systems and shows significant improvements in end-to-end QA on the CORD-19 collection over a state-of-the-art open-domain QA baseline.

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

TwinBERT model for effective and efficient retrieval, which has twin-structured BERT-like encoders to represent query and document respectively and a crossing layer to combine the embeddings and produce a similarity score, is presented.

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

This work proposes a cross-architecture training procedure with a margin focused loss (Margin-MSE), that adapts knowledge distillation to the varying score output distributions of different BERT and non-BERT ranking architectures, and shows that across evaluated architectures it significantly improves their effectiveness without compromising their efficiency.

On the Effect of Low-Frequency Terms on Neural-IR Models

This paper evaluates the neural IR models with various vocabulary sizes for their respective word embeddings, considering different levels of constraints on the available GPU memory, and investigates the use of subword-token embedding models, and in particular FastText, for Neural IR models.