Is it Really that Difficult to Parse German?

  title={Is it Really that Difficult to Parse German?},
  author={Sandra K{\"u}bler and Erhard W. Hinrichs and Wolfgang Maier},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TuBa-D/Z tree-banks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TuBa-D/Z treebanks. Parser… 

Figures and Tables from this paper

Cross parser evaluation : a French Treebanks study

It is shown that the adapted lexicalized parsers do not share the same sensitivity towards the amount of lexical material used for training, thus questioning the relevance of using only one lexicalization model to study the usefulness of Lexicalization for the parsing of French.

Cross parser evaluation and tagset variation: a French treebank study

It is shown that the adapted lexicalized parsers do not share the same sensitivity towards the amount of lexical material used for training, thus questioning the relevance of using only one lexicalization model to study the usefulness of Lexicalization for the parsing of French.

Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation

Focusing on the grammatical dependency triples as an essential dimension of comparison, it is shown that the two very distinct corpora result in comparable parsing performance.

Why is German Dependency Parsing More Reliable than Constituent Parsing

The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser, and the only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 5.3.

Treebank Annotation Schemes and Parser Evaluation for German

The results of the experiments show that, contrary to K¤ ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided.

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited

A thorough comparison of two German treebanks: the TIGER treebank and the TuBa-D/Z is provided, and it is shown that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.

Training Parsers on Partial Trees: A Cross-language Comparison

This study compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity and finds that the projected parsers substantially outperform the authors' heuristic baselines by 9―25% UAS, which corresponds to a 21―43% reduction in error rate.

On Statistical Parsing of French with Supervised and Semi-Supervised Strategies

This paper investigates how to best train a parser on the French Treebank, viewing the task as a trade-off between generaliz-ability and interpretability, and compares a supervised lexicalized parsing algorithm with a semi-supervised un-lexicalized algorithm along the lines of Crabbe and Candito, 2008.

German Treebanks: TIGER and TüBa-D/Z

This chapter presents two major treebanks of German, TIGER and TuBa-D/Z, and presents a comparison of the two annotation schemes along with their advantages and disadvantages.

Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines

This paper examines the performance of three techniques on three treebanks (Negra, Tiger, and TuBa-D/Z): Markovization, lexicalization, and state splitting, and additionally explores parsing with the inclusion of grammatical function information.

Is it Harder to Parse Chinese, or the Chinese Treebank?

A factored-model statistical parser for the Penn Chinese Treebank is developed, showing the implications of gross statistical differences between WSJ and Chinese Tree-banks for the most general methods of parser adaptation, and a detailed analysis of the major sources of statistical parse errors.

Annotation Schemes and their Influence on Parsing Results

This paper uses two similar German treebanks, TuBa-D/Z and NeGra, and investigates the role that different annotation decisions play for parsing, and approximate the two treebanks by gradually taking out or inserting the corresponding annotation components and test the performance of a standard PCFG parser on all treebank versions.

How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges

The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z to allow a comparison of the differences and shows that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.

Annotation Strategies for Probabilistic Parsing in German

An unlexicalized probabilistic parsing model for German trained on the Negra treebank is presented and it is shown that performance compares well with published results for German.

Probabilistic Parsing for German Using Sister-Head Dependencies

This model out-performs the baseline, achieving a labeled precision and recall of up to 74%.

Directed Treebank Refinement for PCFG Parsing

This paper applies nonterminal split and merge operations that it calls Directed Treebank Refinement to transform the structure of a treebank, aiming at encoding the same information in a way more suitable for the parsing task at hand.

Head-Driven Statistical Models for Natural Language Parsing

Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing

An unlexicalized parser for German is presented which employs smoothing and suffix analysis to achieve a labelled bracket F-score of 76.2, higher than previously reported results on the NEGRA corpus.

Experiments on the Automatic Induction of German Semantic Verb Classes

This article presents clustering experiments on German verbs: A statistical grammar model for German serves as the source for a distributional verb description at the lexical syntax-semantics

Accurate Unlexicalized Parsing

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence