ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
- Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, M. Bagheri, R. Summers
- MedicineComputer Vision and Pattern Recognition
- 5 May 2017
A new chest X-rays database, namely ChestX-ray8, is presented, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing, which is validated using the proposed dataset.
Database resources of the National Center for Biotechnology Information
- Richa Tanya Jeff Dennis A Colleen Evan Devon J Rodney St Agarwala Barrett Beck Benson Bollin Bolton Bourexi, R. Agarwala, Kerry Zbicz
- Computer ScienceNucleic Acids Res.
- 13 November 2017
Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database…
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
- Jiao Li, Yueping Sun, Zhiyong Lu
- BiologyDatabase J. Biol. Databases Curation
- 8 May 2016
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
- Yifan Peng, Shankai Yan, Zhiyong Lu
- Computer ScienceBioNLP@ACL
- 13 June 2019
The Biomedical Language Understanding Evaluation (BLUE) benchmark is introduced to facilitate research in the development of pre-training language representations in the biomedicine domain and it is found that the BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieves the best results.
PubTator: a web-based text mining tool for assisting biocuration
- Chih-Hsuan Wei, Hung-Yu Kao, Zhiyong Lu
- Computer Science, BiologyNucleic Acids Res.
- 22 May 2013
PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results.
NCBI disease corpus: A resource for disease name recognition and concept normalization
- R. Dogan, Robert Leaman, Zhiyong Lu
- Computer ScienceJournal of Biomedical Informatics
- 1 February 2014
DNorm: disease name normalization with pairwise learning to rank
- Robert Leaman, R. Dogan, Zhiyong Lu
- Computer ScienceBioinform.
- 21 August 2013
This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.
PubMed and beyond: a survey of web tools for searching biomedical literature
- Zhiyong Lu
- Computer ScienceDatabase J. Biol. Databases Curation
- 17 January 2011
This study reviews 28 Web tools that provide comparable literature search service to PubMed, highlights their respective innovations, compares them to the PubMed system and one another, and discusses directions for future development.
Database resources of the National Center for Biotechnology Information.
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the…
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays
- Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, R. Summers
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 12 January 2018
A novel Text-Image Embedding network (TieNet) is proposed for extracting the distinctive image and text representations of chest X-rays and multi-level attention models are integrated into an end-to-end trainable CNN-RNN architecture for highlighting the meaningful text words and image regions.
...
...