NSEEN: Neural Semantic Embedding for Entity Normalization
@article{Fakhraei2019NSEENNS, title={NSEEN: Neural Semantic Embedding for Entity Normalization}, author={Shobeir Fakhraei and J. Ambite}, journal={ArXiv}, year={2019}, volume={abs/1811.07514} }
Much of human knowledge is encoded in text, available in scientific publications, books, and the web. Given the rapid growth of these resources, we need automated methods to extract such knowledge into machine-processable structures, such as knowledge graphs. An important task in this process is entity normalization, which consists of mapping noisy entity mentions in text to canonical entities in well-known reference sets. However, entity normalization is a challenging problem; there often are…
9 Citations
Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping
- Computer ScienceArXiv
- 2019
A weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities from the scientific literature.
Biomedical Entity Representations with Synonym Marginalization
- Computer Science, BiologyACL
- 2020
To learn from the incomplete synonyms, this paper uses a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates to avoid the explicit pre-selection of negative samples from more than 400K candidates.
CODER: Knowledge infused cross-lingual medical term embedding for term normalization
- Computer ScienceJ. Biomed. Informatics
- 2022
SmallER: Scaling Neural Entity Resolution for Edge Devices
- Computer ScienceInterspeech
- 2021
This paper introduces SmallER, a scalable neural entity resolution system capable of running directly on edge devices and uses compressed tries to reduce the space required to store catalogs and a novel implementation of spatial partitioning trees to strike a balance between reducing runtime latency and preserving recall relative to full catalog search.
Unsupervised Construction of Knowledge Graphs From Text and Code
- Computer Science
- 2019
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.
Biomedical Concept Normalization by Leveraging Hypernyms
- Computer ScienceEMNLP
- 2021
This paper proposes Biomedical Concept Normalizer with Hypernyms (BCNH), a novel framework that adopts list-wise training to make use of both hypernyms and synonyms, and also employs norm constraint on the representation of hypernym-hyponym entity pairs.
Siamese Graph Neural Networks for Data Integration
- Computer ScienceArXiv
- 2020
This work proposes a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources,such as free text from news articles, by combining siamese and graph neural networks to propagate information between connected entities and support high scalability.
Unsupervised Construction of Knowledge Graphs From Text and Code
- Computer ScienceArXiv
- 2019
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.
Unsupervised Construction of Knowledge Graphs From Text and Code
- Computer Science
- 2019
A novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem is presented.
References
SHOWING 1-10 OF 43 REFERENCES
Deep Learning for Entity Matching: A Design Space Exploration
- Computer ScienceSIGMOD Conference
- 2018
The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping
- Computer ScienceArXiv
- 2019
A weakly-supervised data augmentation approach to improve Named Entity Recognition (NER) in a challenging domain: extracting biomedical entities from the scientific literature.
TaggerOne: joint named entity recognition and normalization with semi-Markov Models
- Computer ScienceBioinform.
- 2016
This work proposes the first machine learning model for joint NER and normalization during both training and prediction, which is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for N ER and supervised semantic indexing for normalization.
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
- Computer ScienceCOLING
- 2018
This work presents a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms.
Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs
- Computer ScienceArXiv
- 2017
This paper uses state-of-the-art knowledge graph embeddings -- \rescal, TransE, DistMult and ComplEX -- and evaluates on benchmark datasets -- FB15k and WN18, and proposes embedding based sampling methods.
Learning Text Similarity with Siamese Recurrent Networks
- Computer ScienceRep4NLP@ACL
- 2016
A deep architecture for learning a similarity metric on variablelength character sequences that combines a stack of character-level bidirectional LSTM’s with a Siamese architecture is presented.
GloVe: Global Vectors for Word Representation
- Computer ScienceEMNLP
- 2014
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
- Computer ScienceProc. VLDB Endow.
- 2016
This work considers 17 state-of-the-art blocking methods and uses 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency, and investigates their scalability over a corpus of 7 established synthetic datasets.
Deep Contextualized Word Representations
- Computer ScienceNAACL
- 2018
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
- Computer Science, BiologyBioinform.
- 2020
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.