• Corpus ID: 16483125

Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

  title={Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations},
  author={Raphael Hoffmann and Congle Zhang and Xiao Ling and Luke Zettlemoyer and Daniel S. Weld},
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web's natural language text. [] Key Method We apply our model to learn extractors for NY Times text using weak supervision from Free-base. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

Figures and Tables from this paper

Interactive Learning of Relation Extractors with Weak Supervision

This dissertation shows that the amount of human effort necessary to create relation extractors can be greatly reduced by leveraging a richer set of user interactions, some of which use more accurate models of weak supervision from a database.

Knowledge Base Population through Distant Supervision: Analysis and Improvements

The sources of noise in the mentions are analyzed, and methods to filter out noisy mentions are explored, showing that a combination of heuristics is able to significantly outperform two strong baselines.

Relation Extraction with Weak Supervision and Distributional Semantics

An effective yet efficient algorithm that combines the power of various semantic resources that are automatically mined from a corpus based on distributional semantics that is able to extract a very large set of relations from the web at high precision.

Indirect Supervision for Relation Extraction using Question-Answer Pairs

A novel framework to leverage question-answer pairs as an indirect source of supervision for relation extraction, and adopt a novel margin-based QA loss to reduce noise in DS by exploiting semantic evidence from the QA dataset is proposed.

Knowledge base population using semantic label propagation

Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning

This work proposes a novel word-level distant supervised approach for relation extraction that is effective and improves the area of Precision/Recall(PR) from 0.35 to 0.39 over the state-of-the-art work.

Feature-based models for improving the quality of noisy training data for relation extraction

Two feature-based models for increasing the quality of distant supervision extraction patterns are proposed and evaluated, an extension of a hierarchical topic model that induces background, relation specific and argument-pair specific feature distributions and a perceptron, trained to match an objective function.

Web relation extraction with distant supervision

This thesis explores what can cause NERC methods to fail in diverse genres and quantifies different reasons for NERC failure, and proposes solutions for issues arising for information extraction for not traditionally studied domains.

Weakly Supervised, Data-Driven Acquisition of Rules for Open Information Extraction

A way to acquire rules for Open Information Extraction, based on lemma sequence patterns (including potential typographical symbols) linking two named entities in a sentence, is proposed, which does not necessitate expensive resources or time-consuming handcrafted resources, but does require a large amount of text.

Extreme Extraction: Only One Hour per Relation

A novel system is presented, InstaRead, that streamlines authoring with an ensemble of methods: encoding extraction rules in an expressive and compositional representation, guiding the user to promising rules based on corpus statistics and mined resources, and introducing a new interactive development cycle that provides immediate feedback --- even on large datasets.



Distant supervision for relation extraction without labeled data

This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

Open Information Extraction Using Wikipedia

WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data.

Learning 5000 Relational Extractors

LUCHS is presented, a self-supervised, relation-specific IE system which learns 5025 relations --- more than an order of magnitude greater than any previous approach --- with an average F1 score of 61%.

Modeling Relations and Their Mentions without Labeled Text

A novel approach to distant supervision that can alleviate the problem of noisy patterns that hurt precision by using a factor graph and applying constraint-driven semi-supervision to train this model without any knowledge about which sentences express the relations in the authors' training KB.

Collective Cross-Document Relation Extraction Without Labelled Data

A novel approach to relation extraction is presented that integrates information across documents, performs global inference and requires no labelled text, and tackles relation extraction and entity identification jointly.

Automatically refining the wikipedia infobox ontology

KOG, an autonomous system for refining Wikipedia's infobox-class ontology, is introduced, using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks to build a rich ontology.

Open Information Extraction from the Web

Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced.

The Tradeoffs Between Open and Traditional Relation Extraction

A new model for Open IE called O-CRF is presented and it is shown that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system.

Constructing Biological Knowledge Bases by Extracting Information from Text Sources

A research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases, is begun, to use machine-learning methods to induce routines for extracting facts from text.

Learning to Extract Relations from the Web using Minimal Supervision

An existing relation extraction method is extended to handle this weaker form of supervision, and experimental results demonstrate that the approach can reliably extract relations from web documents.