Learn More
In this paper, we review our experience with constructing one such large annotated corpus-the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been(More)
The Penn Treebank, in its eight years of operation (1989-1996), produced approximately 7 million words of part-of-speech tagged text, 3 million words of skeletally parsed text, over 2 million words of text parsed for predicate-argument structure, and 1.6 million words of transcribed spoken text annotated for speech disfluencies. This paper describes the(More)
The problem of quantitatively comparing tile performance of different broad-coverage grammars of En-glish has to date resisted solution. Prima facie, known English grammars appear to disagree strongly with each other as to the elements of even tile simplest sentences. For instance, the grammars of Steve Abneying), Don tfindle (AT&T), Bob Ingria (BBN), and(More)
Consider the model of grammar advocated in Chomsky (1955, 1957): a non-recursive set of phrase structure rules generates kernel sentences. To these kernel sentence structures singulary transformations are applied, such as aax hopping and passive, and the resultant derived kernel sentences are combined using generalized transformations. Such generalized(More)
In order to account for the absence of the verb-second (V2) phenomenon in the presence of a complementizer, early work on Dutch and German proposed that the finite verb in V2 clauses moves to the position of the complementizer. 1 More recent work, however, has exposed considerable crosslinguistic variation in the availability of V2 in the presence of(More)
This paper discusses the extension of a system developed for automatic discovery of tree-bank annotation inconsistencies over an entire corpus to the particular case of evaluation of inter-annotator agreement. This system makes for a more informative IAA evaluation than other systems because it pinpoints the inconsistencies and groups them by their(More)
This paper presents the first results on parsing the Penn Parsed Corpus of Modern British English (PPCMBE), a million-word historical treebank with an annotation style similar to that of the Penn Tree-bank (PTB). We describe key features of the PPCMBE annotation style that differ from the PTB, and present some experiments with tree transformations to better(More)
In codeswitching contexts, the language of a syntactic head determines the distribution of its complements. Mahootian 1993 derives this generalization by representing heads as the anchors of elementary trees in a lexicalized TAG. However, not all codeswitching sequences are amenable to a head-complement analysis. For instance, adnominal adjectives can(More)