SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

@inproceedings{MacAvaney2020SLEDGEZAZ,
  title={SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search},
  author={Sean MacAvaney and Arman Cohan and Nazli Goharian},
  booktitle={EMNLP},
  year={2020}
}
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus. Clinicians, researchers, and policy-makers need to be able to search these articles effectively. In this work, we present a zero-shot ranking algorithm that adapts to COVID-related scientific literature. Our approach filters training data from another collection down to medical-related queries, uses a neural re-ranking model… 

Figures and Tables from this paper

Searching for scientific evidence in a pandemic: An overview of TREC-COVID

COPER: a Query-Adaptable Semantics-based Search Engine for Persian COVID-19 Articles

This paper collected a large dataset of COVID-19 related articles, leveraged different BERT variations as well as other keyword models such as BM25 and TF-IDF, and created a search engine to sift through these documents and rank them, given a user's query.

COVID-19-Related Scientific Literature Exploration: Short Survey and Comparative Study

Simple Summary The COVID-19-related literature has known a surge since the beginning of the pandemic. This surge prompted the creation of multiple literature exploration systems to help automate the

The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation

Through preliminary experiments on Istella22, it is found that neural re-ranking approaches lag behind LtR models in terms of effectiveness, butLtR models identify the scores from neural models as strong signals, and enables a fair evaluation of traditional learning-to-rank and transfer ranking techniques on the same data.

RCES: Rapid Cues Exploratory Search Using Taxonomies For COVID-19

To assist the COVID-19 focused researchers in life science and healthcare in understanding the pandemic, we present an exploratory information retrieval system called RCES. The system employs a

Reproducing Personalised Session Search over the AOL Query Log

It is demonstrated that this new version of the AOL corpus has a far higher coverage of documents present in the original log than the 2017 version, and including the URL substantially improves performance across a variety of models.

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

This paper uses the MS MARCO and TREC Deep Learning Track as a case study, comparing it to the case of TREC ad hoc ranking in the 1990s and showing how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results.

Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

On Survivorship Bias in MS MARCO

Survivorship bias is the tendency to concentrate on the positive outcomes of a selection process and overlook the results that generate negative outcomes. We observe that this bias could be present

IntenT5: Search Result Diversification using Causal Language Models

This work finds that to encourage diversity in the generated queries, it is beneficial to adapt the model by including a new Distributional Causal Language Modeling (DCLM) objective during fine-tuning and a representation replacement during inference.

References

SHOWING 1-10 OF 33 REFERENCES

SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search

This work presents a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles, and trains the model on a general-domain answer ranking dataset, and transfers the relevance signals to SARS-CoV-2 for evaluation.

TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19

TREC-COVID differs from traditional IR shared task evaluations with special considerations for the expected users, IR modality considerations, topic development, participant requirements, assessment process, relevance criteria, evaluation metrics, iteration process, projected timeline, and the implications of data use as a post-task test collection.

CORD-19: The COVID-19 Open Research Dataset

The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset

The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen

Learning to reformulate long queries for clinical decision support

This work introduces two systems designed to help retrieving medical literature, both of which receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice.

OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline

This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.

Document Ranking with a Pretrained Sequence-to-Sequence Model

Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

Information Retrieval and Extraction on COVID-19 Clinical Articles Using Graph Community Detection and Bio-BERT Embeddings

An information retrieval system on a corpus of scientific articles related to COVID-19 is presented where similarity is determined via shared citations and biological domain-specific sentence embeddings and ego-splitting community detection on the article network is employed.

Passage Re-ranking with BERT

A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.