Learn More
We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trMned and tested on the previously established [5,9,10,15,17] "stan-dard" sections of the Wall Street Journal tree-bank. This represents a 13%(More)
Discriminative reranking is one method for constructing high-performance statistical parsers (Collins, 2000). A discrim-inative reranker requires a source of candidate parses for each sentence. This paper describes a simple yet novel method for constructing sets of 50-best parses based on a coarse-to-fine generative parser (Charniak, 2000). This method(More)
We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by nding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by diierent(More)
By a " tree-bank grammar " we mean a context-free grammar created by reading the production rules directly from hand-parsed sentences in a tree bank. Common wisdom has it that such grammars do not perform we& though we know of no published data on the issue. The primary purpose of this paper is to show that the common wisdom is wrong. In particular , we(More)
Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus(More)
We present two language models based upon an " immediate-head " parser — our name for a parser that conditions all events below a constituent c upon the head of c. While all of the most accurate statistical parsers are of the immediate-head variety, no previous grammatical language model uses this technology. The perplexity for both of these models(More)
It is generally recognized that the common non-terminal labels for syntactic constituents (NP, VP, etc.) do not exhaust the syntactic and semantic information one would like about parts of a syntactic tree. For example, the Penn Tree-bank gives each constituent zero or more 'func-tion tags' indicating semantic roles and other related information not easily(More)
We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words with 55% accuracy for the top 50 words as ranked by the system. The part list could be scanned by an end-user and added to an existing ontology (such as WordNet), or used as a part of a rough semantic lexicon.