• Corpus ID: 7459142

Discosuite - A parser test suite for German discontinuous structures

  title={Discosuite - A parser test suite for German discontinuous structures},
  author={Wolfgang Maier and Miriam Kaeshammer and Peter Baumann and Sandra K{\"u}bler},
Parser evaluation traditionally relies on evaluation metrics which deliver a single aggregate score over all sentences in the parser output, such as PARSEVAL. However, for the evaluation of parser performance concerning a particular phenomenon, a test suite of sentences is needed in which this phenomenon has been identified. In recent years, the parsing of discontinuous structures has received a rising interest. Therefore, in this paper, we present a test suite for testing the performance of… 

Figures and Tables from this paper

Data-Oriented Parsing with Discontinuous Constituents and Function Tags
The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis.
BERT-Proof Syntactic Structures: Investigating Errors in Discontinuous Constituency Parsing
This paper proposes two methods for automatically analysing the errors of discontinuous parser and extends the Berkeley Parser Analyser, a tool that classifies parsing errors according to predefined structural patterns, to discontinuous trees.
Experiments with Easy-first nonprojective constituent parsing
This paper shows that parsing of discontinuous constituents can be achieved using easy-first parsing with online reordering, an approach that previously has only been used for dependencies, and that the approach yields very fast parsing with reasonably accurate results that are close to the state of the art, surpassing existing results that use treebank grammars.
Rich statistical parsing and literary language
It is found that literary ratings are predictable from textual features to a large extent, and this result clearly rules out the notion that these value-judgments of literary merit were arbitrary, or predominantly determined by factors beyond the text.
The Benefit of Syntactic vs. Linear N-grams for Linguistic Description
An attempt to employ dependency annotations for describing style using syntactic n-grams, which allows for the detection of linguistically meaningful patterns that do not emerge in a linear n- gram analysis.
Do FreeWord Order Languages Need More Treebank Data? Investigating Dative Alternation in German, English, and Russian
The results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
Dependency Analysis of Scrambled References for Better Evaluation of Japanese Translation
A rule-free method that uses a dependency parser to check scrambled sentences and generated alternatives for 80% of sentences is presented and the experimental results show that the method improves sentence-level correlation with human judgments.
Discontinuous parsing with continuous trees
We introduce a new method for incremental shift-reduce parsing of discontinuous constituency trees, based on the fact that discontinuous trees can be transformed into continuous trees by changing the


Unbounded Dependency Recovery for Parser Evaluation
A new parser evaluation corpus containing around 700 sentences annotated with unbounded dependencies, from seven different grammatical constructions is introduced, to evaluate how well state-of-the-art parsing technology is able to recover such dependencies.
Making Ellipses Explicit in Dependency Conversion for a German Treebank
A carefully designed dependency conversion of the German phrase-structure treebank TiGer that explicitly represents verb ellipses by introducing empty nodes into the tree by using heuristics and derives a canonical dependency format without empty nodes is presented.
TIGER: Linguistic Interpretation of a German Corpus
The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.
Tree Distance and Some Other Variants of Evalb
It is argued that the tree-distance measure ameliorates a problem that has been noted concerning over-penalisation of attachment errors, and is suggested to be a suitable measure for parser evaluation.
Cross-Framework Evaluation for Statistical Parsing
A principled protocol for evaluating parsing results across frameworks based on function trees, tree generalization and edit distance metrics is presented, which extends a previously proposed framework for cross-theory evaluation and allows us to compare a wider class of parsers.
Characterizing Discontinuity in Constituent Treebanks
An empirical evaluation on German data as well as an investigation of the relation between the measures and grammars extracted from treebanks shows their relevance.
PLCFRS Parsing Revisited: Restricting the Fan-Out to Two
This paper presents a parser for binary PLCFRS of fan-out two, together with a novel monotonous estimate for A parsing, and conducts experiments on modified versions of the German NeGra treebank and the Discontinuous Penn Treebank in which all trees have block degree two.
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution. Prima facie, known English grammars appear to disagree strongly
Dependency structures and lexicalized grammars
In this dissertation, I show that both the generative capacity and the parsing complexity of lexicalized grammar formalisms are systematically related to structural properties of the dependency
Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems
This paper presents the first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRSs), and shows that data-driven LCFRS parsing is feasible and yields output of competitive quality.