• Corpus ID: 1647450

Rich Syntax from a Raw Corpus: Unsupervised Does It

  title={Rich Syntax from a Raw Corpus: Unsupervised Does It},
  author={Shimon Edelman and Zach Solan and David Horn and Eytan Ruppin},
We compare our model of unsupervised learning of linguistic structures, ADIOS [1], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories), and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent affinity with Mildly Context Sensitive… 

Figures from this paper

A systematic review of unsupervised approaches to grammar induction

The theoretical and experimental studies considered suggest that a usage-based, incremental, sequential system of grammar is more appropriate than the formal, non-incremental, hierarchical view of grammar.

Unsupervised Formal Grammar Induction with Confidence

Though evaluating an unsupervised syntactic model is difficult, an evaluation using the Corpus of Linguistic Acceptability is presented and state-of-the-art performance is shown.

Automated Ontology Elicitation and Usage

An overview of Ontology Elicitation and Usage and research questions, more probabilistic models that use ontologies and other topics.



Unsupervised Efficient Learning and Representation of Language Structure

We describe a linguistic pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of corpus data. This is achieved by compactly coding recursively

Unsupervised Language Acquisition: Theory and Practice

This thesis presents various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models, and examines the interaction between the various components to show how these algorithms can form the basis for a empiricist model of language acquisition.

Grammatical constructions and linguistic generalizations: The What's X doing Y? construction

Our goal is to present, by means of the detailed analysis of a single grammatical problem, some of the principal commitments and mechanisms of a grammatical theory that assigns a central role to the

Automatic Acquisition and Efficient Representation of Syntactic Structures

The distributional principle according to which morphemes that occur in identical contexts belong, in some sense, to the same category is extended by applying it recursively, and by using mutual information for estimating category coherence.

Formal grammar and information theory: together again?

  • Fernando C Pereira
  • Linguistics
    Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
  • 2000
In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information

Learning Syntax and Meanings Through Optimization and Distributional Analysis

It is perhaps misleading to use the word theory to describe the view of language acquisition and cognitive development, which is the subject of this chapter. This word is used as a matter of

Distributional Structure

This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.

Beyond Grammar: An Experience-Based Theory of Language

This work presents a DOP model for tree representations, a formal stochastic language theory, and a model for non-context-free representations for compositional semantic representations.

Unsupervised induction of stochastic context-free grammars using distributional clustering

An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy

ABL: Alignment-Based Learning

A new type of grammar learning algorithm, inspired by string edit distance, that takes a corpus of flat sentences as input and returns a Corpus of labelled, bracketed sentences that works on pairs of unstructured sentences.