NSEEN: Neural Semantic Embedding for Entity Normalization

@article{Fakhraei2019NSEENNS,
  title={NSEEN: Neural Semantic Embedding for Entity Normalization},
  author={Shobeir Fakhraei and J. Ambite},
  journal={ArXiv},
  year={2019},
  volume={abs/1811.07514}
}
Much of human knowledge is encoded in text, available in scientific publications, books, and the web. Given the rapid growth of these resources, we need automated methods to extract such knowledge into machine-processable structures, such as knowledge graphs. An important task in this process is entity normalization, which consists of mapping noisy entity mentions in text to canonical entities in well-known reference sets. However, entity normalization is a challenging problem; there often are… 
Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping
TLDR
A weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities from the scientific literature.
Biomedical Entity Representations with Synonym Marginalization
TLDR
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
SmallER: Scaling Neural Entity Resolution for Edge Devices
TLDR
This paper introduces SmallER, a scalable neural entity resolution system capable of running directly on edge devices and uses compressed tries to reduce the space required to store catalogs and a novel implementation of spatial partitioning trees to strike a balance between reducing runtime latency and preserving recall relative to full catalog search.
Unsupervised Construction of Knowledge Graphs From Text and Code
TLDR
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.
Biomedical Concept Normalization by Leveraging Hypernyms
TLDR
This paper proposes Biomedical Concept Normalizer with Hypernyms (BCNH), a novel framework that adopts list-wise training to make use of both hypernyms and synonyms, and also employs norm constraint on the representation of hypernym-hyponym entity pairs.
Siamese Graph Neural Networks for Data Integration
TLDR
This work proposes a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources,such as free text from news articles, by combining siamese and graph neural networks to propagate information between connected entities and support high scalability.
Unsupervised Construction of Knowledge Graphs From Text and Code
TLDR
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.
Unsupervised Construction of Knowledge Graphs From Text and Code
TLDR
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.

References

SHOWING 1-10 OF 43 REFERENCES
Deep Learning for Entity Matching: A Design Space Exploration
TLDR
The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping
TLDR
A weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities from the scientific literature.
TaggerOne: joint named entity recognition and normalization with semi-Markov Models
TLDR
This work proposes the first machine learning model for joint NER and normalization during both training and prediction, which is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for N ER and supervised semantic indexing for normalization.
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
TLDR
This work presents a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms.
Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs
TLDR
This paper uses state-of-the-art knowledge graph embeddings -- \rescal, TransE, DistMult and ComplEX -- and evaluates on benchmark datasets -- FB15k and WN18, and proposes embedding based sampling methods.
Learning Text Similarity with Siamese Recurrent Networks
TLDR
A deep architecture for learning a similarity metric on variablelength character sequences that combines a stack of character-level bidirectional LSTM’s with a Siamese architecture is presented.
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
TLDR
This work considers 17 state-of-the-art blocking methods and uses 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency, and investigates their scalability over a corpus of 7 established synthetic datasets.
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
TLDR
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
...
...