CGELBank: CGEL as a Framework for English Syntax Annotation
@article{Reynolds2022CGELBankCA, title={CGELBank: CGEL as a Framework for English Syntax Annotation}, author={Brett Reynolds and Aryaman Arora and Nathan Schneider}, journal={ArXiv}, year={2022}, volume={abs/2210.00394} }
We introduce the syntactic formalism of the Cambridge Grammar of the English Language (CGEL) to the world of treebanking through the CGELBank project. We discuss some issues in linguistic analysis that arose in adapt-ing the formalism to corpus annotation, fol-lowed by quantitative and qualitative compar-isons with parallel UD and PTB treebanks. We argue that CGEL provides a good tradeoff between comprehensiveness of analysis and us-ability for annotation, which motivates expand-ing the…
Figures and Tables from this paper
References
SHOWING 1-10 OF 32 REFERENCES
The Penn Treebank: Annotating Predicate Argument Structure
- Computer ScienceHLT
- 1994
The implementation of crucial aspects of this new syntactic annotation scheme incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying" position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions.
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
- Linguistics, Computer ScienceLREC
- 2020
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation…
RRGbank: a Role and Reference Grammar Corpus of Syntactic Structures Extracted from the Penn Treebank
- Computer Science, Linguistics
- 2018
RRGbank, a corpus of syntactic trees from the Penn Treebank automatically converted to syntactic structures following Role and Reference Grammar (RRG), is presented.
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
- Computer ScienceCL
- 2007
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word-word dependencies, and discusses the implications of the findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks.
Universal Dependencies v1: A Multilingual Treebank Collection
- LinguisticsLREC
- 2016
This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank
- Computer ScienceIJCNLP
- 2004
This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified…
Mischievous nominal constructions in Universal Dependencies
- LinguisticsUDW
- 2021
The kinds of mischievous nominal expressions attested in English UD corpora are surveyed and solutions primarily with English in mind are proposed, but which may offer paths to solutions for a variety of UD languages.
Towards Robust Linguistic Analysis using OntoNotes
- Computer Science, LinguisticsCoNLL
- 2013
An analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.
A Gold Standard Dependency Corpus for English
- Computer ScienceLREC
- 2014
It is shown that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing.
Text Generation and Systemic-Functional Linguistics: Experiences from English and Japanese
- Linguistics
- 1992
Up to and beyond the limits of the basic framework: metafunctional refinements stratal extensions - the environment as seen for lexicogrammar.