• Corpus ID: 218502142

SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search

@article{MacAvaney2020SLEDGEAS,
  title={SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search},
  author={Sean MacAvaney and Arman Cohan and Nazli Goharian},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.02365}
}
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and policy-makers need a way to effectively search these articles. In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles. We train the model on a general-domain answer ranking dataset, and transfer the relevance signals to SARS-CoV-2 for evaluation. We… 

Figures and Tables from this paper

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

This work presents a zero-shot ranking algorithm that adapts to COVID-related scientific literature, and uses a neural re-ranking model pre-trained on scientific text (SciBERT), and filters the target document collection.

Searching for scientific evidence in a pandemic: An overview of TREC-COVID

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

CO-Search is presented, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis.

CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization

CO-Search is presented, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis.

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

This work proposes a general approach for vertical search based on domain-specific pretraining and presents a case study for the biomedical domain, which performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition.

Frugal neural reranking: evaluation on the Covid-19 literature

Results on this dataset show that, when starting with a strong baseline, the light neural ranking model can achieve results that are comparable to other model architectures that use very large number of parameters.

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

This work presents an information retrieval methodology for effectively finding relevant publications for different information needs using traditional information retrieval models, as well as modern neural natural language processing algorithms for an infodemic.

AUEB-NLP at BioASQ 8: Biomedical Document and Snippet Retrieval

The submissions of AUEB’s NLP group to the BIOASQ 8 document and snippet retrieval tasks are presented and neural methods to encode, index, and directly retrieve snippets (sentences) and indirectly documents containing the retrieved snippets are tested.

Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

The TREC-COVID competition setup, participation, and resulting reflections and lessons learned about the state-of-art technology when faced with the acute task of retrieving precise scientific information from a rapidly growing corpus of literature, in response to highly specialised queries, in the middle of a pandemic are described.

References

SHOWING 1-10 OF 38 REFERENCES

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset

The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen

CORD-19: The COVID-19 Open Research Dataset

The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.

TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19

TREC-COVID differs from traditional IR shared task evaluations with special considerations for the expected users, IR modality considerations, topic development, participant requirements, assessment process, relevance criteria, evaluation metrics, iteration process, projected timeline, and the implications of data use as a post-task test collection.

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

CovidQA is presented, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's CO VID-19 Open Research Dataset Challenge, the first publicly available resource of its type.

Learning to reformulate long queries for clinical decision support

This work introduces two systems designed to help retrieving medical literature, both of which receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice.

TREC genomics special issue overview

This special issue is devoted to the TREC Genomics Track, which ran from 2003 to 2007, and has expanded in recent years with the growth of new infor-mation needs.

OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline

This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

CEDR: Contextualized Embeddings for Document Ranking

This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.

Passage Re-ranking with BERT

A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.