• Corpus ID: 8636901

A Genetic Algorithm for the Induction of Natural Language Grammars

  title={A Genetic Algorithm for the Induction of Natural Language Grammars},
  author={Tony C. Smith and Ian H. Witten},
Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the “zero-frequency” problem entails assigning a small probability to all… 

Figures from this paper

Evolving natural language grammars without supervision

Results indicate that the proposed algorithm is able to improve the results of a classical optimization algorithm, such as EM (Expectation Maximization), for short grammar constituents (right side of the grammar rules), and its precision is better in general.

Evolutionary Parsing for a Probabilistic Context Free Grammar

  • Lourdes Araujo
  • Computer Science
    Rough Sets and Current Trends in Computing
  • 2000
This paper describes a probabilistic natural language parser based on a genetic algorithm that produces successive generations of individuals, computing their "fitness" at each step and selecting the best of them when the termination condition is reached.

[Category: Genetic Programming] Genetic Programming for Grammar Induction

A new approach is presented where the aim is to formalize a control module for the genetic search which can use the interdependency information existing in CFGs and hence can direct the search only among well-t grammars in the search space.

Inducing Combinatory Categorial Grammars with Genetic Algorithms

This paper presents and evaluates a system utilizing a simple GA to successively search and improve on categorial-assignments of Combinatory Categorial Grammars by their potential affinity with the Genetic Algorithms.


It is not the aim to propose a new statistical model for parsing but a new algorithm to perform the parsing once the model has been defined, and results are obtained with very encouraging results.

Evolutionary algorithm for noun phrase detection in natural language processing

An evolutionary algorithm is presented for obtaining a probabilistic finite-state automaton, able to recognize valid noun phrases defined as a sequence of lexical categories, and works with both, positive and negative examples of the language, thus improving the system coverage, while maintaining its precision.

A Genetic Programming Experiment in Natural Language Grammar Engineering

The performance of the evolved grammar after 1,000 generations on an unseen test set is improved and the adaptation of the Genetic Programming paradigm to the problem of grammar engineering is illustrated.

Highly accurate error-driven method for noun phrase detection

How evolutionary algorithms are applied to statistical natural language processing

A survey of many works which apply EAs to different NLP problems, including syntactic and semantic analysis, grammar induction, summaries and text generation, document clustering and machine translation is presented.

Discovering grammar rules for Automatic Extraction of Definitions

This paper proposes an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Programming to improve the performance of the learning programs.



Context Free Grammar Induction Using Genetic Algorithms

A genetic algorithm was developed for the purpose of inferring context free grammars and various forms of the grammar to generate the language of correctly balanced and nested brackets were successfully inferred.

Statistical language learning

Eugene Charniak points out that as a method of attacking NLP problems, the statistical approach has several advantages and is grounded in real text and therefore promises to produce usable results, and it offers an obvious way to approach learning.

Compression by induction of hierarchical grammars

The paper describes a technique that constructs models of symbol sequences in the form of small, human-readable, hierarchical grammars. The grammars are both semantically plausible and compact. The

Language Identification in the Limit

  • E. M. Gold
  • Linguistics, Computer Science
    Inf. Control.
  • 1967

Augmenting a Hidden Markov Model for Phrase-Dependent Word Tagging

The paper describes refinements that are currently being investigated in a model for part-of-speech assignment to words in unrestricted text. The model has the advantage that a pre-tagged training

Connectionist Finite State Natural Language Processing

Unless one is prepared to argue that existing, ‘classical’formal language and automata theory, together with the natural language linguistics built on them, are fundamentally mistaken about the nature of language, then such an argument is not warranted.

Maintaining Diversity in Genetic Search

An improvement to the standard genetic adaptive algorithm is presented which guarantees diversity of the gene pool throughout the search, and is shown to improve off-line (or best) performance of these algorithms at the expense of poorer on-line performance and to retard or prevent premature convergence.

A Connectionist Model of Motion and Government in Chomsky's Government-binding Theory

In this paper we present a connectionist model of movement in government-binding (GB) theory. The model is a collection of regularly connected groups of connectionist units using only two

Language and reality: An introduction to the philosophy of language

Preface to the Second Edition. Preface to the First Edition. Part I: Introduction: 1. Introduction: 1. 1 The Philosophy of Language. 1. 2 What is the Problem? 1. 3 What is a Theory of Language? **1.

Fundamentals of speech recognition

This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.