• Publications
  • Influence
ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
A new chest X-rays database, namely ChestX-ray8, is presented, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing, which is validated using the proposed dataset. Expand
Database resources of the National Center for Biotechnology Information
Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence databaseExpand
PubTator: a web-based text mining tool for assisting biocuration
PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results. Expand
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community. Expand
DNorm: disease name normalization with pairwise learning to rank
This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. Expand
NCBI disease corpus: A resource for disease name recognition and concept normalization
The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks. Expand
PubMed and beyond: a survey of web tools for searching biomedical literature
  • Zhiyong Lu
  • Computer Science, Medicine
  • Database J. Biol. Databases Curation
  • 17 January 2011
This study reviews 28 Web tools that provide comparable literature search service to PubMed, highlights their respective innovations, compares them to the PubMed system and one another, and discusses directions for future development. Expand
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
The Biomedical Language Understanding Evaluation (BLUE) benchmark is introduced to facilitate research in the development of pre-training language representations in the biomedicine domain and it is found that the BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieves the best results. Expand
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
This task was found to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction. Expand
Predicting subcellular localization of proteins using machine-learned classifiers
Five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria are constructed, which are the most accurate sub cellular predictors across the widest set of organisms ever published. Expand