Foundations of Statistical Natural Language Processing

Abstract

In 1993, Eugene Charniak published a slim volume entitled Statistical Language Learning. At the time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak’s text was the first to treat the emerging field. Nowadays, the revolution has become the establishment; for instance, in 1998, nearly half the papers in Computational Linguistics concerned empirical methods (Hirschberg, 1998). Indeed, Christopher Manning and Hinrich Schütze’s new, by-no-means slim textbook on statistical NLP — strangely, the first since Charniak’s — begins, “The need for a thorough textbook for Statistical Natural Language Processing hardly needs to be argued for”. Indubitably so; the question is, is this it? Foundations of Statistical Natural Language Processing (henceforth FSNLP) is certainly ambitious in scope. True to its name, it contains a great deal of preparatory material, including: gentle introductions to probability and information theory; a chapter on linguistic concepts; and (a most welcome addition) discussion of the nitty-gritty of doing empirical work, ranging from lists of available corpora to indepth discussion of the critical issue of smoothing. Scattered throughout are also topics fundamental to doing good experimental work in general, such as hypothesis testing, cross-validation, and baselines. Along with these preliminaries, FSNLP covers traditional tools of the trade: Markov models, probabilistic grammars, supervised and unsupervised classification, and the vector-space model. Finally, several chapters are devoted to specific problems, among them lexicon acquisition, word sense disambiguation, parsing, machine translation, and information retrieval. (The companion website contains further useful material, including links to programs and a list of errata.) In short, this is a Big Book, and this fact alone already confers some benefits. For the researcher, FSNLP offers the convenience of one-stop shopping: at present, there is no other NLP reference in which standard empirical techniques, statistical tables, definitions of linguistics terms, and elements of information retrieval appear together; furthermore, the text also summarizes and critiques many individual research papers. Similarly, someone teaching a course on statistical NLP will appreciate the large number of topics FSNLP covers, allowing the tailoring of a syllabus to individual interests. And for those entering the field, the book records “folklore” knowledge that is typically acquired only by word of mouth

DOI: 10.1023/A:1011424425034

Extracted Key Phrases

0200400600'00'02'04'06'08'10'12'14'16
Citations per Year

8,421 Citations

Semantic Scholar estimates that this publication has 8,421 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Manning2001FoundationsOS, title={Foundations of Statistical Natural Language Processing}, author={Christopher D. Manning and Hinrich Sch{\"{u}tze}, journal={Information Retrieval}, year={2001}, volume={4}, pages={80-81} }