KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts

@inproceedings{MovshovitzAttias2015KBLDAJL,
  title={KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts},
  author={Dana Movshovitz-Attias and William W. Cohen},
  booktitle={ACL},
  year={2015}
}
Many existing knowledge bases (KBs), including Freebase, Yago, and NELL, rely on a fixed ontology, given as an input to the system, which defines the data to be cataloged in the KB, i.e., a hierarchy of categories and relations between them. The system then extracts facts that match the predefined ontology. We propose an unsupervised model that jointly learns a latent ontological structure of an input corpus, and identifies facts from the corpus that match the learned structure. Our approach… 

Figures and Tables from this paper

Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling
TLDR
This paper presents hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents, and demonstrates the superiority of hrLDA over existing topic models, especially for building hierarchies.
Grounding Topic Models with Knowledge Bases
TLDR
Experiments show significant superiority of the approach in topic perplexity and key entity identification, indicating potentials of the grounded modeling for semantic extraction and language understanding applications.
Enriching Topic Models with DBpedia
TLDR
This work first reduces the text documents to a set of entities and then enrich this set with background knowledge from DBpedia, to produce semantic topic models that can be represented in a knowledge base.
Knowledge Base Enhanced Topic Modeling
TLDR
This paper takes knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations, and proposes a knowledge base enhanced topic model that boosts the LDA model on the document classification while no supervision information is needed.
Relation Schema Induction using Tensor Factorization with Side Information
TLDR
This paper proposes Schema Induction using Coupled Tensor Factorization (SICTF), a novel tensor factorization method for relation schema induction that factorizes Open Information Extraction triples extracted from a domain corpus along with additional side information in a principled way to induce relation schemas.
Machine Learning with World Knowledge: The Position and Survey
TLDR
This paper starts from the comparison of world knowledge with domain-specific knowledge, and introduces three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision.
Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis
TLDR
DefIE is an approach to large-scale Information Extraction based on a syntactic-semantic analysis of textual definitions that leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically.
Weakly-Supervised Learning of Visual Relations
TLDR
A novel approach for modeling visual relations between pairs of objects where the predicate is typically a preposition or a verb that links a pair of objects, and proposes a weakly-supervised discriminative clustering model to learn relations from image-level labels only.
Code and Named Entity Recognition in StackOverflow
TLDR
A new named entity recognition (NER) corpus for the computer programming domain is introduced, consisting of 15,372 sentences annotated with 20 fine-grained entity types, and the SoftNER model is presented, which incorporates a context-independent code token classifier with corpus-level features to improve the BERT-based tagging model.
Higher-order Relation Schema Induction using Tensor Factorization with Back-off and Aggregation
TLDR
This paper proposes Tensor Factorization with Back-off and Aggregation (TFBA), a novel framework for the HRSI problem and is the first attempt at inducing higher-order relation schemata from unlabeled text.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Yago: a core of semantic knowledge
TLDR
YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts, which includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE).
Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors
TLDR
A neural tensor network (NTN) model is introduced which predicts new relationship entries that can be added to the database and can classify unseen relationships in WordNet with an accuracy of 75.8%.
Reasoning With Neural Tensor Networks for Knowledge Base Completion
TLDR
An expressive neural tensor network suitable for reasoning over relationships between two entities given a subset of the knowledge base is introduced and performance can be improved when entities are represented as an average of their constituting word vectors.
Acquiring temporal constraints between relations
TLDR
The proposed algorithm, GraphOrder, is a novel and scalable graph-based label propagation algorithm that takes transitivity of temporal order into account, as well as these statistics on narrative order of verb mentions, and achieves as high as 38.4% absolute improvement in F1 over a random baseline.
Grounded Discovery of Coordinate Term Relationships between Software Entities
TLDR
An approach for the detection of coordinate-term relationships between entities from the software domain, that refer to Java classes, using a similarity measure for Java classes using distributional information about how they are used in software, which is combined with corpus statistics on the distribution of contexts in which the classes appear in text.
Block-LDA: Jointly modeling entity-annotated text and entity-entity links
TLDR
A model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked is presented.
Semantic Taxonomy Induction from Heterogenous Evidence
TLDR
This work proposes a novel algorithm for inducing semantic taxonomies that flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word's coordinate terms to help in determining its hypernyms, and vice versa.
Automatic Acquisition of Hyponyms from Large Text Corpora
TLDR
A set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest are identified.
Reading Tea Leaves: How Humans Interpret Topic Models
TLDR
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
Identifying Relations for Open Information Extraction
TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.
...
...