• Corpus ID: 3198964

Direct Parsing of Discontinuous Constituents in German

@inproceedings{Maier2010DirectPO,
  title={Direct Parsing of Discontinuous Constituents in German},
  author={Wolfgang Maier},
  booktitle={SPMRL@NAACL-HLT},
  year={2010}
}
  • Wolfgang Maier
  • Published in SPMRL@NAACL-HLT 5 June 2010
  • Computer Science
Discontinuities occur especially frequently in languages with a relatively free word order, such as German. Generally, due to the longdistance dependencies they induce, they lie beyond the expressivity of Probabilistic CFG, i.e., they cannot be directly reconstructed by a PCFG parser. In this paper, we use a parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRS), a formalism with high expressivity, to directly parse the German NeGra and TIGER treebanks. In both treebanks… 

Figures and Tables from this paper

PLCFRS Parsing of English Discontinuous Constituents
TLDR
This paper uses probabilistic linear context-free rewriting systems for data-driven parsing, following recent work on parsing German, and demonstrates that by discarding information on non-local dependencies the PCFG model loses important information on syntactic dependencies in general.
Experiments with Easy-first nonprojective constituent parsing
TLDR
This paper shows that parsing of discontinuous constituents can be achieved using easy-first parsing with online reordering, an approach that previously has only been used for dependencies, and that the approach yields very fast parsing with reasonably accurate results that are close to the state of the art, surpassing existing results that use treebank grammars.
Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity
TLDR
Data-Oriented Parsing (dop) is applied to a mildly context-sensitive grammar formalism which allows for discontinuous trees, and results emulate and surpass the state of the art in discontinuous parsing.
PLCFRS Parsing Revisited: Restricting the Fan-Out to Two
TLDR
This paper presents a parser for binary PLCFRS of fan-out two, together with a novel monotonous estimate for A parsing, and conducts experiments on modified versions of the German NeGra treebank and the Discontinuous Penn Treebank in which all trees have block degree two.
Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
TLDR
This work applies Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS), and finds that the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.
Discontinuity and Non-Projectivity: Using Mildly Context-Sensitive Formalisms for Data-Driven Parsing
TLDR
This work presents a parser for probabilistic Linear Context-Free Rewriting Systems and uses it for constituency and dependency treebank parsing and shows that its result quality for constituency parsing is comparable to the output quality of other state-of-the-art results.
Discontinuous Parsing with an Efficient and Accurate DOP Model
We present a discontinuous variant of tree-substitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to
German and English Treebanks and Lexica for Tree-Adjoining Grammars
TLDR
A treebank and lexicon for German and English, developed for PLTAG parsing, which include the NP annotation by Vadas and Curran, and include the prediction lexicon necessary for PL TAG.
Efficient parsing with Linear Context-Free Rewriting Systems
TLDR
This work shows that parsing long sentences with such an optimally binarized grammar remains infeasible, and introduces a technique which removes this length restriction, while maintaining a respectable accuracy.
Incremental Discontinuous Phrase Structure Parsing with the GAP Transition
TLDR
A novel transition system for discontinuous lexicalized constituent parsing called SR-GAP is introduced, an extension of the shift-reduce algorithm with an additional gap transition that outperforms the previous best transition-based discontinuous parser by a large margin.
...
1
2
3
...

References

SHOWING 1-10 OF 36 REFERENCES
Treebanks and Mild Context-Sensitivity
TLDR
A measure for the degree of a treebank’s mild contextsensitivity is presented and compared to similar measures used in non-projective dependency parsing and to discontinuous phrase structure grammar (DPSG).
Computing the Most Probable Parse for a Discontinuous Phrase Structure Grammar
TLDR
An implementation of an agenda-based chart parsing algorithm that is capable of computing the Most Probable Parse for a given input sentence for probabilistic versions of both DPSG and Context-Free Grammar is outlined.
Characterizing Discontinuity in Constituent Treebanks
TLDR
An empirical evaluation on German data as well as an investigation of the relation between the measures and grammars extracted from treebanks shows their relevance.
Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines
TLDR
This paper examines the performance of three techniques on three treebanks (Negra, Tiger, and TuBa-D/Z): Markovization, lexicalization, and state splitting, and additionally explores parsing with the inclusion of grammatical function information.
Treebank Grammar Techniques for Non-Projective Dependency Parsing
TLDR
This paper shows how to reduce non-projective dependency parsing to parsing with Linear Context-Free Rewriting Systems (LCFRS), by presenting a technique for extracting LCFRS from dependency treebanks and an algorithm that computes this transformation for a large, empirically relevant class of grammars.
The Penn Treebank: Annotating Predicate Argument Structure
TLDR
The implementation of crucial aspects of this new syntactic annotation scheme incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying" position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions.
Head-Driven Statistical Models for Natural Language Parsing
  • M. Collins
  • Computer Science
    Computational Linguistics
  • 2003
TLDR
Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Discontinuity Revisited: An Improved Conversion to Context-Free Representations
TLDR
A labeled dependency evaluation shows that the new conversion method leads to better results by preserving local relationships and introducing fewer inconsistencies into the training data.
Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar
TLDR
This work describes the induction of a probabilistic LTAG model from the Penn Treebank and finds that this induction method is an improvement over the EM-based method of (Hwa, 1998), and that the induced model yields results comparable to lexicalized PCFG.
Improved Inference for Unlexicalized Parsing
TLDR
A novel coarse-to-fine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar without a treebank is presented.
...
1
2
3
4
...