• Publications
  • Influence
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
TLDR
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
TLDR
This paper introduces query-agnostic indexable representations of document phrases that can drastically speed up open-domain QA, and introduces dense-sparse phrase encoding, which effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents.
Biomedical Entity Representations with Synonym Marginalization
TLDR
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
TLDR
It is shown that ranking paragraphs and aggregating answers using Paragraph Ranker improves performance of open-domain QA pipeline on the four open- domain QA datasets by 7.8% on average.
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition
TLDR
The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words and improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.
Pre-trained Language Model for Biomedical Question Answering
TLDR
This paper investigates the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions.
Simple Entity-Centric Questions Challenge Dense Retrievers
TLDR
This paper investigates the issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training, and demonstrates that data augmentation is unable to fix the generalization problem.
Learning Dense Representations of Phrases at Scale
TLDR
This work shows for the first time that it can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA and proposes a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference.
A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining
TLDR
The BERN uses high-performance BioBERT named entity recognition models which recognize known entities and discover new entities and various named entity normalization models are integrated into BERN for assigning a distinct identifier to each recognized entity.
Answering Questions on COVID-19 in Real-Time
TLDR
CovidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time, is outlined, which leverages both supervised and unsupervised approaches to provide informative answers.
...
1
2
3
...