• Publications
  • Influence
BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition
TLDR
BANNER is an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field and is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
TLDR
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
DNorm: disease name normalization with pairwise learning to rank
TLDR
This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
TLDR
This task was found to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction.
TaggerOne: joint named entity recognition and normalization with semi-Markov Models
TLDR
This work proposes the first machine learning model for joint NER and normalization during both training and prediction, which is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for N ER and supervised semantic indexing for normalization.
Inter-species normalization of gene mentions with GNAT
TLDR
The first publicly available system, GNAT, reported to handle inter-species GN, uses extensive background knowledge on genes to resolve ambiguous names to EntrezGene identifiers and performs comparably to single-species approaches proposed by us and others.
Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks
TLDR
It is concluded that user comments pose a significant natural language processing challenge, but do contain useful extractable information which merits further exploration and is evaluated on a manually annotated set of user comments with promising performance.
Overview of BioCreative II gene normalization
TLDR
Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement, which show promise as tools to link the literature with biological databases.
PubTator central: automated concept annotation for biomedical full text articles
TLDR
The full text results in PTC significantly increase biomedical concept coverage and it is anticipated this expansion will both enhance existing downstream applications and enable new use cases.
...
...