Foundations of Statistical Natural Language Processing

Abstract

In 1993, Eugene Charniak published a slim volume entitled Statistical Language Learning. At the time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak's text was the first to treat the emerging field. Nowadays, the revolution has become the establishment; for instance, in 1998, nearly half the papers in Computational Linguistics concerned empirical methods (Hirschberg, 1998). Indeed, Christopher Manning and Hinrich Schütze's new, by-no-means slim textbook on statistical NLP — strangely, the first since Charniak's 1 — begins, " The need for a thorough textbook for Statistical Natural Language Processing hardly needs to be argued for ". Indubitably so; the question is, is this it? Foundations of Statistical Natural Language Processing (henceforth FSNLP) is certainly ambitious in scope. True to its name, it contains a great deal of preparatory material, including: gentle introductions to probability and information theory; a chapter on linguistic concepts; and (a most welcome addition) discussion of the nitty-gritty of doing empirical work, ranging from lists of available corpora to in-depth discussion of the critical issue of smoothing. Scattered throughout are also topics fundamental to doing good experimental work in general, such as hypothesis testing, cross-validation, and baselines. Along with these preliminaries, FSNLP covers traditional tools of the trade: Markov models, probabilis-tic grammars, supervised and unsupervised classification, and the vector-space model. Finally, several chapters are devoted to specific problems, among them lexicon acquisition, word sense disambigua-tion, parsing, machine translation, and information retrieval. 2 (The companion website contains further useful material, including links to programs and a list of errata.) In short, this is a Big Book 3 , and this fact alone already confers some benefits. For the researcher, FSNLP offers the convenience of one-stop shopping: at present, there is no other NLP reference in which standard empirical techniques, statistical tables, definitions of linguistics terms, and elements of information retrieval appear together; furthermore, the text also summarizes and critiques many individual research papers. Similarly, someone teaching a course on statistical NLP will appreciate the large number of topics FSNLP covers, allowing the tailoring of a syllabus to individual interests. And for those entering the field, the book records " folklore " knowledge that is typically acquired only by word of mouth 1 In the interim, the second edition of Allen's book (1995) did include some material on probabilistic methods, and …

DOI: 10.1023/A:1011424425034

Extracted Key Phrases

Showing 1-7 of 7 references

Every time I fire a linguist, my performance goes up, " and other myths of the statistical natural language processing revolution

  • Julia Hirschberg
  • 1998

Statistical Methods for Speech Recognition

  • Frederick Jelinek
  • 1997

press. Speech and Language Processing

  • Daniel Jurafsky, James Martin
Showing 1-10 of 4,744 extracted citations
0200400600800'00'02'04'06'08'10'12'14'16
Citations per Year

8,881 Citations

Semantic Scholar estimates that this publication has received between 8,429 and 9,358 citations based on the available data.

See our FAQ for additional information.