Princeton University, Korea University
Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
- Minjoon Seo, Jinhyuk Lee, T. Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi
- Computer ScienceACL
- 13 June 2019
This paper introduces query-agnostic indexable representations of document phrases that can drastically speed up open-domain QA, and introduces dense-sparse phrase encoding, which effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents.
Biomedical Entity Representations with Synonym Marginalization
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
It is shown that ranking paragraphs and aggregating answers using Paragraph Ranker improves performance of open-domain QA pipeline on the four open- domain QA datasets by 7.8% on average.
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition
- Wonjin Yoon, Chan Ho So, Jinhyuk Lee, Jaewoo Kang
- Computer ScienceBMC Bioinformatics
- 21 September 2018
The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words and improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.
Pre-trained Language Model for Biomedical Question Answering
- Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang
- Computer SciencePKDD/ECML Workshops
- 16 September 2019
This paper investigates the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions.
Simple Entity-Centric Questions Challenge Dense Retrievers
- Christopher Sciavolino, Zexuan Zhong, Jinhyuk Lee, Danqi Chen
- Computer ScienceEMNLP
- 17 September 2021
This paper investigates the issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training, and demonstrates that data augmentation is unable to fix the generalization problem.
Learning Dense Representations of Phrases at Scale
This work shows for the first time that it can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA and proposes a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference.
A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining
The BERN uses high-performance BioBERT named entity recognition models which recognize known entities and discover new entities and various named entity normalization models are integrated into BERN for assigning a distinct identifier to each recognized entity.
Answering Questions on COVID-19 in Real-Time
CovidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time, is outlined, which leverages both supervised and unsupervised approaches to provide informative answers.