A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts

  title={A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts},
  author={Michael Thelen and Ellen Riloff},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
This paper describes a bootstrapping algorithm called Basilisk that learns high-quality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective information over a large body of extraction pattern contexts. We evaluate Basilisk on six semantic categories. The semantic lexicons produced by… 

Figures and Tables from this paper

Bootstrapping a Semantic Lexicon on Verb Similarities

This work presents a bootstrapping algorithm to create a semantic lexicon from a list of seed words and a corpus that was mined from the web, and finds that verbs that are highly domain related achieved the highest accuracy.

Mutual Screening Graph Algorithm: A New Bootstrapping Algorithm for Lexical Acquisition

A new bootstrapping algorithm called Mutual Screening Graph Algorithm (MSGA) to learn semantic lexicons that uses only unannotated corpus and a few of seed words to learn new words for each semantic category by changing the format of extracted patterns and the method for scoring patterns and words.

Building a Semantic Lexicon of English Nouns via Bootstrapping

The use of a weakly supervised bootstrapping algorithm in discovering contrasting semantic categories from a source lexicon with little training data is described, showing that such automatically categorized terms tend to agree with human judgements.

Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

NEG-FINDER is presented, the first approach for discovering negative categories automatically, and effectively removes the necessity of manual intervention and formulation of negative categories, with performance closely approaching that obtained using negative categories defined by a domain expert.

AutoEncoder Guided Bootstrapping of Semantic Lexicon

This work improves Basilisk by modifying its two scoring functions, and incorporates AutoEncoder to the scoring functions of patterns and candidates to reduce the bias problems and obtain more balanced results.

Ensemble-based Semantic Lexicon Induction for Semantic Tagging

An ensemble-based framework for semantic lexicon induction that incorporates three diverse approaches for semantic class identification that outperforms individual methods in terms of both lexicon quality and instance-based semantic tagging is presented.

Corpus-based Semantic Lexicon Induction with Web-based Corroboration

This research uses a weakly supervised bootstrapping algorithm to induce a semantic lexicon from a text corpus, and then issue Web queries to generate co-occurrence statistics between each lexicon entry and semantically related terms.

An unsupervised method for lexical acquisition based on Bootstrapping

  • Yuhan ZhangYanquan Zhou
  • Computer Science
    2009 International Conference on Natural Language Processing and Knowledge Engineering
  • 2009
This paper presents an unsupervised method called Mutual Screening Graph Algorithm based on Bootstrapping (MSGA-Bootstrapping) for lexical acquisition, and shows that MSGA can outperform previous bootstrapping algorithm Basilisk and GMR (Graph Mutual Reinforcement based Bootstrapped).

Combining Contexts in Lexicon Learning for Semantic Parsing

A method for the automatic construction of noun entries in a semantic lexicon by modifying adjective, verb-deep-subject and verbdeep-object yields very high precision for most semantic features, giving rise to the fully automatic incorporation into the lexicon.



A Corpus-Based Approach for Building Semantic Lexicons

This paper presents a corpus-based method that can be used to build semantic lexicons for specific categories using a small set of seed words for a category and a representative text corpus.

Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping

A multilevel bootstrapping algorithm is presented that generates both the semantic lexicon and extraction patterns simultaneously simultaneously and produces high-quality dictionaries for several semantic categories.

Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction

This paper presents an algorithm for extracting potential entries for a category from an on-line corpus, based upon a small set of exemplars, that could be viewed as an "enhancer" of existing broad-coverage resources.

Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction

This paper presents an algorithm for extracting potential entries for a category from an on-line corpus, based upon a small set of exemplars, that could be viewed as an ``enhancer'' of existing broad-coverage resources.

An Empirical Approach to Conceptual Case Frame Acquisition

A corpus-based algorithm for acquiring conceptual case frames empirically from unannotated text that learns semantic preferences for each extraction pattern and merges the syntactically compatible patterns to produce multi-slot case frames with selectional restrictions.

Automatic Acquisition of Hyponyms from Large Text Corpora

A set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest are identified.

CRYSTAL: Inducing a Conceptual Dictionary

CRYSTAL is described, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus that can often surpass human intuitions in creating reliable extraction rules.

Automatic construction of a hypernym-labeled noun hierarchy from text

This work goes a step further by automatically creating not just clusters of related words, but a hierarchy of nouns and their hypernyms, akin to the hand-built hierarchy in WordNet.

Automatically Generating Extraction Patterns from Untagged Text

  • E. Riloff
  • Computer Science
    AAAI/IAAI, Vol. 2
  • 1996
This work has developed a system called AutoSlog-TS that creates dictionaries of extraction patterns using only untagged text, and in experiments with the MUG-4 terrorism domain, created a dictionary of extraction pattern that performed comparably to a dictionary created by autoSlog, using only preclassified texts as input.

A method for disambiguating word senses in a large corpus

The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval.