Corpus-based Learning of Analogies and Semantic Relations

  title={Corpus-based Learning of Analogies and Semantic Relations},
  author={Peter D. Turney and Michael L. Littman},
  journal={Machine Learning},
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly… 

Solving Relational Similarity Problems Using the Web as a Corpus

The main idea is to look for verbs, prepositions, and coordinating conjunctions that can help make explicit the hidden relations between the target nouns.

Expressing Implicit Semantic Relations without Supervision

An unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations, and achieves state-of-the-art results, performing significantly better than several alternative pattern ranking algorithms, based on tf-idf.

Analogy perception applied to seven tests of word comprehension

PairClass is presented, an algorithm for analogy perception that recognises lexical proportional analogies using representations that are automatically generated from a large corpus of raw textual data.

Distributional Memory: A General Framework for Corpus-Based Semantics

The Distributional Memory approach is shown to be tenable despite the constraints imposed by its multi-purpose nature, and performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against several state-of-the-art methods.

UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals

Semantic Relations Between Nominals

A range of relation inventories of varying granularity, which have been proposed by computational linguists are introduced, and machine learning techniques in which data redundancy and variability lead to fast and reliable relation extraction are presented.

Latent Relational Model for Relation Extraction

This paper extends a relational model that has been shown to be effective in solving word analogies and adapt it to the relation extraction problem and shows that this approach outperforms the state-of-the-art methods on a relation extraction dataset.

Compositional approaches for representing relations between words: A comparative study

Ranking Relations using Analogies in Biological and Information Networks

An approach to relational learning which, given a set of pairs of objects S, measures how well other pairs A : B fit in with the set S, and combines a similarity measure on function spaces with Bayesian analysis to produce a ranking.

Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense



A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.

Experiments on Linguistically-Based Term Associations

  • G. Ruge
  • Computer Science
    Inf. Process. Manag.
  • 1992

The Descent of Hierarchy, and Selection in Relational Semantics

This paper explores the possibility of using an existing lexical hierarchy for the purpose of placing words from a noun compound into categories, and then using this category membership to determine the relation that holds between the nouns.

A Probabilistic Account of Logical Metonymy

This article acquires the meanings of metonymic verbs and adjectives from a large corpus and proposes a probabilistic model that provides a ranking on the set of possible interpretations and identifies the interpretations automatically by exploiting the consistent correspondences between surface syntactic cues and meaning.

Semi-Automatic Recognition of Noun Modifier Relationships

This work presents a semi-automatic system that identifies semantic relationships in noun phrases without using precoded noun or adjective semantics, and in experiments on English technical texts the system correctly identified 60--70% of relationships automatically.

Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy

It is found that a very simple approach using a machine learning algorithm and a domain-specific lexical hierarchy successfully generalizes from training instances, performing better on previously unseen words than a baseline consisting of training on the words themselves.

Metaphor as an Emergent Property of Machine-Readable Dictionaries

It is argued that this approach to metaphor interpretation obviates the need for the traditional "metaphor-handling component" in natural language understanding systems, and will allow these systems to overcome the britdeness of hand-coded approaches.

Word Association Norms, Mutual Information and Lexicography

The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems

Three merging rules for combining probability distributions are examined: the well known mixture rule, the logarithmic rule, and a novel product rule that were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics|synonym questions and analogy questions.

Learning surface text patterns for a Question Answering System

This paper has developed a method for learning an optimal set of surface text patterns automatically from a tagged corpus, and calculates the precision of each pattern, and the average precision for each question type.