Head-Driven Statistical Models for Natural Language Parsing


HEAD DRIVEN STATISTICAL MODELS FOR NATURAL LANGUAGE PARSING Michael Collins Supervisor Professor Mitch Marcus Statistical models for parsing natural language have recently shown considerable suc cess in broad coverage domains Ambiguity often leads to an input sentence having many possible parse trees statistical approaches assign a probability to each tree thereby rank ing competing trees in order of plausibility The probability for each candidate tree is calculated as a product of terms each term corresponding to some sub structure within the tree The choice of parameterization is the choice of how to break down the tree There are two critical questions regarding the parameterization of the problem What linguistic objects e g context free rules parse moves should the model s parameters be associated with I e How should trees be decomposed into smaller fragments How can this choice be instantiated in a sound probabilistic model This thesis argues that the locality of a lexical head s in uence in a tree should motivate modeling choices in the parsing problem In the nal parsing models a parse tree is repre sented as the sequence of decisions corresponding to a head centered top down derivation of the tree Independence assumptions then follow naturally leading to parameters that encode the X bar schema subcategorization ordering of complements placement of ad juncts lexical dependencies wh movement and preferences for close attachment All of these preferences are expressed by probabilities conditioned on lexical heads

DOI: 10.1162/089120103322753356

Extracted Key Phrases

Citations per Year

2,009 Citations

Semantic Scholar estimates that this publication has 2,009 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Collins2003HeadDrivenSM, title={Head-Driven Statistical Models for Natural Language Parsing}, author={Michael Collins}, journal={Computational Linguistics}, year={2003}, volume={29}, pages={589-637} }