Formal grammar and information theory: together again?

  title={Formal grammar and information theory: together again?},
  author={Fernando C Pereira},
  journal={Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences},
  pages={1239 - 1253}
  • Fernando C Pereira
  • Published 15 April 2000
  • Linguistics
  • Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information theory in the Shannon tradition. Zellig Harris had advocated a close alliance between grammatical and information–theoretic principles in the analysis of natural language, and early formal–language theory provided another strong link between information theory and linguistics. Nevertheless, in most… 
Unsupervised Language Acquisition: Theory and Practice
This thesis presents various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models, and examines the interaction between the various components to show how these algorithms can form the basis for a empiricist model of language acquisition.
On Perceived Conceptual and Methodological Divergences in Linguistic Theory and Cognitive Science: Distributional Analyses, Universal Grammar, and Language Acquisition
A careful consideration of the history and subsequent development of generative grammar and the biolinguistic/I-language approach will show that distributional analyses were never abandoned in Chomsky’s program and that external linguistic data are integral to a theory of UG.
Generative linguistics and neural networks at 60: Foundation, friction, and fusion
Abstract:The birthdate of both generative linguistics and neural networks can be taken as 1957, the year of the publication of foundational work by both Noam Chomsky and Frank Rosenblatt. This
The Tradition of Categoricity and Prospects for Stochasticity
“Everyone knows that language is variable.” This is the bald sentence with which Sapir (1921:147) begins his chapter on language as an historical product. He goes on to emphasize how two speakers’
Rich Syntax from a Raw Corpus: Unsupervised Does It
The goal here is to help bridge statistical and formal approaches to language by placing the unsupervised learning of structure in the context of current research in grammar acquisition in computational linguistics, and at the same time to link it to certain formal theories of grammar.
Computational Models of First Language Acquisition Special Issue of Research on Language and Computation
This work evaluates a model of some aspect of language acquisition as a computational system and evaluating it on naturally occurring corpora to see whether naturally occurring distributions of examples in corpora provide sufficient information to support the studied claims across a divergent range of acquisition theories.
Source codes in human communication
How the distributional properties of languages meet the various challenges arising from the differences between information systems and natural languages are described, along with the very different perspective on human communication these properties suggest.
Tracking the origins of transformational generative grammar1
Tracking the main influences of 19thand 20th-century mathematics, logic and philosophy on pre-1958 American linguistics and especially on early Transformational Generative Grammar (TGG) is an
It is argued that, in principle, machine learning (ML) results could inform basic debates about language, in one area at least, and that in practice, existing results may offer initial tentative support for this prospect.
Towards a Statistical Model of Grammaticality
A statistical model of grammati- cality is presented which maps the probabilities of a statistical model for sentences in parts of the British National Corpus (BNC) into grammaticality scores, using various functions of the parame- ters of the model.


Statistical Methods and Linguistics
In the space of the last ten years, statistical methods have gone from being virtually unknown in computational linguistics to being a fundamental given, and the excitement about statistical methods is also shared by those in the cognitive reaches of computational linguistic.
Head-driven phrase structure grammar
This book presents the most complete exposition of the theory of head-driven phrase structure grammar, introduced in the authors' "Information-Based Syntax and Semantics," and demonstrates the applicability of the HPSG approach to a wide range of empirical problems.
A Formal Theory of Inductive Inference. Part II
Learning dependency transduction models from unannotated examples
  • H. Alshawi, Shona Douglas
  • Computer Science
    Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
  • 2000
A hierarchical decomposition of bi–language strings emerges from the training process; this decomposition may or may not correspond to familiar linguistic phrase structure, suggesting an approach to language processing in which natural language itself is the semantic representation.
Derivational Minimalism
A simple grammar formalism with these properties is presented here and briefly explored and can define languages that are not in the class of languages definable by tree adjoining grammars.
The Mental representation of grammatical relations
The editor of this volume, who is also author or coauthor of five of the contributions, has provided an introduction that not only affords an overview of the separate articles but also interrelates
Algebraic learning of statistical associations for language acquisition
A theoretical foundation for the dual algebraic/statistical nature of associations is developed, proving two uniqueness theorems, and an algorithmic solution to the estimation problem is provided.
Stochastic Attribute-Value Grammars
Stochastic attribute-value grammars are defined and an algorithm for computing the maximum-likelihood estimate of their parameters is given and it is shown that sampling can be done using the more general Metropolis-Hastings algorithm.
The Minimalist Program
  • J. Zwart
  • Linguistics
    Journal of Linguistics
  • 1998
Noam Chomsky,The Minimalist Program. (Current Studies in Linguistics 28.) Cambridge, MA: MIT Press, 1995. Pp. 420. The Minimalist Program, by Noam Chomsky, is a collection of four articles, ‘The
The Mathematics of Sentence Structure
An effective rule (or algorithm) for distinguishing sentences from nonsentences is obtained, which works not only for the formal languages of interest to the mathematical logician, but also for natural languages such as English, or at least for fragments of such languages.