• Corpus ID: 3166184

Annotating Coordination in the Penn Treebank

@inproceedings{Maier2012AnnotatingCI,
  title={Annotating Coordination in the Penn Treebank},
  author={Wolfgang Maier and Sandra K{\"u}bler and Erhard W. Hinrichs and Julia Kriwanek},
  booktitle={LAW@ACL},
  year={2012}
}
Finding coordinations provides useful information for many NLP endeavors. However, the task has not received much attention in the literature. A major reason for that is that the annotation of major treebanks does not reliably annotate coordination. This makes it virtually impossible to detect coordinations in which two conjuncts are separated by punctuation rather than by a coordinating conjunction. In this paper, we present an annotation scheme for the Penn Treebank which introduces a… 

Figures and Tables from this paper

Coordination Annotation Extension in the Penn Tree Bank
TLDR
This work initiated manual annotation process for solving coordination issues in the Penn Treebank, and the outcome is an extension of the PTB that includes consistent and detailed structures for coordinations.
Are All Commas Equal ? Detecting Coordination in the Penn Treebank
TLDR
This work presents the first approach to classifying punctuation signs into whether they function as separators between conjuncts in coordination or not, and shows that by using information from a parser in combination with context information, it reaches an F-score of 89.22 on positive cases.
Improving the parsing of French coordination through annotation standards and targeted features
TLDR
This study explores various methods for improving the transition-based parsing of coordinated structures in French and compares four different annotations for coordinated structures, demonstrating the importance of globally unambiguous annotation for punctuation, and discusses the decision process of a transition- based parser for coordination.
A Tagging Approach to Identify Complex Constituents for Text Simplification
TLDR
A supervised tagging approach is proposed to classify signs of syntactic complexity in accordance with their linking and bounding functions, using an annotated corpus covering three different genres.
Annotating Signs of Syntactic Complexity to Support Sentence Simplification
TLDR
A new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena.
A Discourse-Annotated Corpus of Conjoined VPs
TLDR
This paper describes how tokens were identified; how the process of span and sense annotation was modified and extended in order to keep the annotation of intra-sentential multi-clausal structures consistent with the rest of the corpus; and what the resulting corpus looks like, in terms of token frequency and common sense patterns.
Creating a Corpus of Conjoined VPs 2 . 1 Identifying Conjoined VPs
English grammars indicate a variety of relations holding between conjoined VPs. VPs conjoined by and evince such senses as Result, Temporal Sequence and Concession. Although all these senses are ones
Generating Elliptic Coordination
TLDR
It is argued that elided material should be represented using phonetically empty nodes and a set of rewrite rules which permits adding these empty categories to the SR data are introduced and an existing surface realiser is evaluated on the resulting dataset.
Identifying signs of syntactic complexity for rule-based sentence simplification
TLDR
A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination.
Identifying non-elliptical entity mentions in a coordinated NP with ellipses
...
1
2
...

References

SHOWING 1-10 OF 13 REFERENCES
The TIGER Treebank
TLDR
The TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences, is reported on and what kind of information is encoded in the treebank is described and the different representation formats are introduced.
Coordinate Noun Phrase Disambiguation in a Generative Parsing Model
TLDR
Methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a lexicalised history-based parsing model are presented and changes to the baseline model result in an increase in NP coordination dependency f-score.
The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone
TLDR
The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.
A Discriminative Learning Model for Coordinate Conjunctions
TLDR
This work reports promising empirical results in detecting and disambiguating coordinated noun phrases in the GENIA corpus, despite a relatively small number of training examples and minimal features are employed.
Parsing Coordinations
TLDR
Four experiments presented show that n-best parsing combined with reranking improves results by a large margin and provides the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score.
The VERBMOBIL Treebanks
The Verbmobil treebanks of spoken German, English, and Japanese are part of the Verbmobil project, which has the overriding goal to develop a speaker-independent system for the translation of
Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution
TLDR
This paper shows how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks using unsupervised algorithms.
Building a Large Annotated Corpus of English: The Penn Treebank
TLDR
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Right node raising and gapping
This book investigates two elliptical coordinations in German, Right Node Raising and Gapping. Ellipsis in both constructions is claimed to be the result of a phonological process which is
A Linguistically Interpreted Corpus of German Newspaper Text
TLDR
This paper reports on the development of an annotation scheme and annotation tools for unrestricted German text based on argument structure, but also permits the extraction of other kinds of representations.
...
1
2
...