• Corpus ID: 9183838

NEMO: Extraction and normalization of organization names from PubMed affiliation strings

  title={NEMO: Extraction and normalization of organization names from PubMed affiliation strings},
  author={Siddhartha R. Jonnalagadda and Philip Topham},
  journal={Journal of Biomedical Discovery and Collaboration},
  pages={50 - 75}
Background. We are witnessing an exponential increase in biomedical research citations in PubMed. However, translating biomedical discoveries into practical treatments is estimated to take around 17 years, according to the 2000 Yearbook of Medical Informatics, and much information is lost during this transition. Pharmaceutical companies spend huge sums to identify opinion leaders and centers of excellence. Conventional methods such as literature search, survey, observation, self-identification… 
Relationship extraction for knowledge graph creation from biomedical literature
This paper presents and compares a few rule-based and machine learning-based methods for scalable relationship extraction from biomedical literature, and for the integration into the knowledge graphs, and examines how resilient are these various methods to unbalanced and fairly small datasets.
ELAD: An Entity Linking Based Affiliation Disambiguation Framework
An automatic learning framework based on entity linking, entity type recognition, candidate generation, and result selection is proposed which solves many problems that cannot be solved by traditional methods: the connection between institution entities, mistakes correction, and the reduction of manual and pre-prepared knowledge.
New Methods for Metadata Extraction from Scientific Literature
Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping
Named Entity Matching in Publication Databases - A Case Study of PubMed in SONCA
A case study in approximate data matching for a database system that contains information about scientific publications, concerned with matching instances of objects such as XML documents, persons’ names, affiliations, journal names, and so on.
CompanyDepot: Employer Name Normalization in the Online Recruitment Industry
This paper focuses on this employer name normalization task, which has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location context, and handling name variations, irrelevant input data, and noises in the KB.
Knowledge Graph: Semantic Representation and Assessment of Innovation Ecosystems
This work introduces a framework that assists with performing competence occupants tasks and proves the general applicability of the framework and to illustrate how to solve concrete business cases from the automotive domain.
Comparing institutional-level bibliometric research performance indicator values based on different affiliation disambiguation systems
The key finding is that for the sample institutions, the studied systems provide bibliometric indicator values that have only a limited accuracy, and additional data cleaning for disambiguating affiliation data is recommended.
A pipeline for extracting and deduplicating domain-specific knowledge bases
A pipeline developed at CareerBuilder LLC for building a KB describing employers is described, by first extracting entities from both global, publicly available data sources and a proprietary source, and then deduplicating the instances to yield an employer-specific KB.
SKILL: A System for Skill Identification and Normalization
An automated approach for skill entity recognition and optimal normalization is proposed and the beta version of the system is currently applied in various big data and business intelligence applications for workforce analytics and career track projections at CareerBuilder.


ONER: Tool for Organization Named Entity Recognition from Affiliation Strings in PubMed Abstracts
The process for extracting organization names from the affiliation sentences of articles related to biomedicine involves multi-layered rule matching with multiple dictionaries and achieves 99.6% measure in extracting organizations names.
The strength of co-authorship in gene name disambiguation
It is suggested that the co-authorship information and the circumstances of the articles' release can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.
Automated recognition of malignancy mentions in biomedical literature
Together, these results suggest that the identification of disparate biomedical entity classes in free text may be achievable with high accuracy and only moderate additional effort for each new application domain.
Disambiguating authors in academic publications using random forests
This paper describes an algorithm for pair-wise disambiguation of author names based on a machine learning classification algorithm, random forests, and defines a set of similarity profile features to assist in author disambigsuation.
Using clustering strategies for creating authority files
The notion of approximate word matching is introduced and it is shown how it can be used to improve detection and categorization of variant forms in bibliographic entries and reduce the human effort involved in the creation of authority files.
Sequence Alignment Algorithms
This work is concerned with efficient methods for practical biomolecular sequence comparison, focusing on global and local alignment algorithms and analyses the classical approaches of Needleman & Wunsch and Smith & Waterman as well as efficient alternatives; in particular, the algorithms recently designed by Crochemore, Landau and Ziv-Ukelson that use compression techniques to achieve sub-quadratic time complexity.
A probabilistic similarity metric for Medline records: A model for author name disambiguation
We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual,
Biology Based Alignments of Paraphrases for Sentence Compression
It is seen, through classical visualization methodologies and exhaustive experiments, that clustering may not be the best approach for automatic pattern identification.
The Impact of Named Entity Normalization on Information Retrieval for Question Answering
It is found that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval, and better normalization results in better retrieval performance.
The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)
A semi-automatic method based on finite-state techniques for the unification of corporate source data, with potential applications for bibliometric purposes is proposed, though it requires some human processing.