CGELBank: CGEL as a Framework for English Syntax Annotation

@article{Reynolds2022CGELBankCA,
  title={CGELBank: CGEL as a Framework for English Syntax Annotation},
  author={Brett Reynolds and Aryaman Arora and Nathan Schneider},
  journal={ArXiv},
  year={2022},
  volume={abs/2210.00394}
}
We introduce the syntactic formalism of the Cambridge Grammar of the English Language (CGEL) to the world of treebanking through the CGELBank project. We discuss some issues in linguistic analysis that arose in adapt-ing the formalism to corpus annotation, fol-lowed by quantitative and qualitative compar-isons with parallel UD and PTB treebanks. We argue that CGEL provides a good tradeoff between comprehensiveness of analysis and us-ability for annotation, which motivates expand-ing the… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 32 REFERENCES

The Penn Treebank: Annotating Predicate Argument Structure

The implementation of crucial aspects of this new syntactic annotation scheme incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying" position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions.

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation

RRGbank: a Role and Reference Grammar Corpus of Syntactic Structures Extracted from the Penn Treebank

RRGbank, a corpus of syntactic trees from the Penn Treebank automatically converted to syntactic structures following Role and Reference Grammar (RRG), is presented.

CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word-word dependencies, and discusses the implications of the findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks.

Universal Dependencies v1: A Multilingual Treebank Collection

This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.

Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank

This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified

Mischievous nominal constructions in Universal Dependencies

The kinds of mischievous nominal expressions attested in English UD corpora are surveyed and solutions primarily with English in mind are proposed, but which may offer paths to solutions for a variety of UD languages.

Towards Robust Linguistic Analysis using OntoNotes

An analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

A Gold Standard Dependency Corpus for English

It is shown that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing.

Text Generation and Systemic-Functional Linguistics: Experiences from English and Japanese

Up to and beyond the limits of the basic framework: metafunctional refinements stratal extensions - the environment as seen for lexicogrammar.