Corpus ID: 9400086

Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora

  title={Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora},
  author={Lidong Bing and Bhuwan Dhingra and Kathryn Mazaitis and Jong Hyuk Park and William W. Cohen},
We propose a framework to improve performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We combine this with a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by… Expand
Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion
A meta-learning framework that aims at handling infrequent relations with few-shot learning and uncommon entities by using textual descriptions is proposed and a novel model is designed to better extract key information from textual descriptions. Expand
Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning
This work presents a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristic such as co-training and novel domain-specific heuristical algorithms. Expand
Semi-Supervised Learning with Declaratively Specified Entropy Constraints
The proposed method can be used to specify ensembles of semi-supervised learning, as well as agreement constraints and entropic regularization constraints between these learners, and can beUsed to model both well-known heuristics such as co-training and novel domain-specific heuristic. Expand


Distant IE by Bootstrapping Using Lists and Document Structure
It is shown that augmenting a large corpus with coupling constraints from even a small, well-structured corpus can improve performance substantially, doubling F1 on one task. Expand
Multi-instance Multi-label Learning for Relation Extraction
This work proposes a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables that performs competitively on two difficult domains. Expand
Distant supervision for relation extraction without labeled data
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Expand
Modeling Relations and Their Mentions without Labeled Text
A novel approach to distant supervision that can alleviate the problem of noisy patterns that hurt precision by using a factor graph and applying constraint-driven semi-supervision to train this model without any knowledge about which sentences express the relations in the authors' training KB. Expand
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
A novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts is presented. Expand
Learning to Extract Relations from the Web using Minimal Supervision
An existing relation extraction method is extended to handle this weaker form of supervision, and experimental results demonstrate that the approach can reliably extract relations from web documents. Expand
Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists
This work describes a procedure for using label propagation on a graph in which the nodes are entity mentions, and mentions are coupled when they occur in coordinate list structures and shows that this labeling approach leads to good performance even when off-the-shelf classifiers are used on the distantly-labeled data. Expand
Open Information Extraction Using Wikipedia
WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Expand
Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles
A data-driven variant of the LR algorithm for dependency parsing is presented, and extended with a best-first search for probabilistic generalized LR dependency parsing, and applied to both tracks of the CoNLL 2007 shared task. Expand
Snowball: extracting relations from large plain-text collections
This paper develops a scalable evaluation methodology and metrics for the task, and presents a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents. Expand