Corpus ID: 218502142

SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search

@article{MacAvaney2020SLEDGEAS,
  title={SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search},
  author={Sean MacAvaney and Arman Cohan and Nazli Goharian},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.02365}
}
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and policy-makers need a way to effectively search these articles. In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles. We train the model on a general-domain answer ranking dataset, and transfer the relevance signals to SARS-CoV-2 for evaluation. We… Expand
SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search
TLDR
This work presents a zero-shot ranking algorithm that adapts to COVID-related scientific literature, and uses a neural re-ranking model pre-trained on scientific text (SciBERT), and filters the target document collection. Expand
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. OurExpand
Searching for scientific evidence in a pandemic: An overview of TREC-COVID
TLDR
This paper provides a comprehensive overview of the structure and results of TREC-COVID, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19. Expand
COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
TLDR
CO-Search is presented, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. Expand
CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization
TLDR
CO-Search is presented, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis. Expand
A Comparative Analysis of System Features Used in the TREC-COVID Information Retrieval Challenge
TLDR
It is observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5, and term expansion and the use of the narrative field in the TREC-COVID topics were associated with decreased system performance in both rounds. Expand
A comparative analysis of system features used in the TREC-COVID information retrieval challenge
TLDR
It is observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5, and term expansion and the use of the narrative field in the TREC-COVID topics was associatedwith decreased system performance in both rounds. Expand
Frugal neural reranking: evaluation on the Covid-19 literature
TLDR
Results on this dataset show that, when starting with a strong baseline, the light neural ranking model can achieve results that are comparable to other model architectures that use very large number of parameters. Expand
Information retrieval in an infodemic: the case of COVID-19 publications
TLDR
A multi-stage information retrieval architecture combines probabilistic weighting models and re-ranking algorithms based on neural masked language models that could support the effective search and discovery of relevant information in the case of an infodemic. Expand
RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble
TLDR
A simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from lexical and semantic retrieval systems, pre-trained and fine-tuned BERT rankers, and relevance feedback runs, to meet the challenge of building a search engine for rapidly evolving biomedical collection. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 42 REFERENCES
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset
The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the AllenExpand
TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19
TLDR
TREC-COVID differs from traditional IR shared task evaluations with special considerations for the expected users, IR modality considerations, topic development, participant requirements, assessment process, relevance criteria, evaluation metrics, iteration process, projected timeline, and the implications of data use as a post-task test collection. Expand
Rapidly Bootstrapping a Question Answering Dataset for COVID-19
TLDR
CovidQA is presented, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's CO VID-19 Open Research Dataset Challenge, the first publicly available resource of its type. Expand
Learning to reformulate long queries for clinical decision support
TLDR
This work introduces two systems designed to help retrieving medical literature, both of which receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice. Expand
TREC genomics special issue overview
TLDR
This special issue is devoted to the TREC Genomics Track, which ran from 2003 to 2007, and has expanded in recent years with the growth of new infor-mation needs. Expand
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
TLDR
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines. Expand
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
CEDR: Contextualized Embeddings for Document Ranking
TLDR
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines. Expand
Passage Re-ranking with BERT
TLDR
A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10. Expand
Document Ranking with a Pretrained Sequence-to-Sequence Model
TLDR
Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective. Expand
...
1
2
3
4
5
...