Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing

  title={Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing},
  author={William P. Headden and Mark Johnson and David McClosky},
Unsupervised grammar induction models tend to employ relatively simple models of syntax when compared to their supervised counterparts. Traditionally, the unsupervised models have been kept simple due to tractability and data sparsity concerns. In this paper, we introduce basic valence frames and lexical information into an unsupervised dependency grammar inducer and show how this additional information can be leveraged via smoothing. Our model produces state-of-the-art results on the task of… 

Figures and Tables from this paper

Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
This paper defines a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions and significantly improves the state-of-the-art, when measured by head attachment accuracy.
Concavity and Initialization for Unsupervised Dependency Grammar Induction
Despite their simplicity, it is found that initializing the dependency model with valence using the authors' concave models can approach state of the art grammar induction results for English and Chinese.
Unsupervised Bayesian Lexicalized Dependency Grammar Induction
This dissertation investigates learning dependency grammars for statistical natural language parsing from corpora without parse tree annotations with a focus on smoothing, and finds that smoothing is helpful for even unlexicalized models such as the Dependency Model with Valence.
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
It is shown that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsu-pervised parser, Seginer's (2007) CCL.
Sparsity in Grammar Induction
We explore the role of sparsity in unsupervised dependency parser grammar induction by exploiting a common trend in many languages: the number of unique combinations of pairs of part-of-speech (POS)
Gibbs Sampling with Treeness Constraint in Unsupervised Dependency Parsing
This paper evaluates a sequence of experiments for Czech with various modifications of corpus initiation, of dependency edge probability model and of sampling procedure, stressing especially the treeness constraint.
Unsupervised Neural Dependency Parsing
A novel approach to unsupervised dependency parsing that uses a neural model to predict grammar rule probabilities based on distributed representation of POS tags that outperforms previous approaches utilizing POS correlations and is competitive with recent state-of-the-art approaches on nine different languages.
Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
A prior knowledge of STOP-probabilities (whether a given word has any children in a given direction), obtained from a large raw corpus using the reducibility principle, is exploited by incorporating this knowledge into Dependency Model with Valence.
Sparsity in Dependency Grammar Induction
This work investigates sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007) and shows that its approach improves on several other state-of-the-art techniques.
Weakly supervised parsing with rules
This work proposes a new research direction to address the lack of structures in traditional n-gram models. It is based on a weakly supervised dependency parser that can model speech syntax without


Modeling Valence Effects in Unsupervised Grammar Induction
This work extends the dependency grammar induction model of Klein and Manning (2004) to incorporate further valence information and uses an expanded grammar which tracks higher orders of valence and allows each valence slot to be filled by a separate distribution rather than using one distribution for all slots.
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
This work presents a generative model for the unsupervised learning of dependency structures and describes the multiplicative combination of this dependency model with a model of linear constituency that works and is robust cross-linguistically.
Guiding Unsupervised Grammar Induction Using Contrastive Estimation
It is shown that, using the same features, log-linear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task).
Head-Driven Statistical Models for Natural Language Parsing
  • M. Collins
  • Computer Science
    Computational Linguistics
  • 2003
Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Annealing Structural Bias in Multilingual Weighted Grammar Induction
This work shows how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples, and annealing the free parameter that controls this bias achieves further improvements.
Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction
A family of priors over probabilistic grammar weights is presented, called the shared logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilism grammar, providing a new way to encode prior knowledge about an unknown grammar.
Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars
This work presents O(n4) parsing algorithms for two bilexical formalisms, improving the prior upper bounds of O( n5) by one step and an improved grammar constant by another.
Novel estimation methods for unsupervised discovery of latent structure in natural language text
The novel estimation methods presented are better suited to adaptation for real engineering tasks than the maximum likelihood baseline, and are shown to achieve significant improvements over maximum likelihood estimation and maximum a posteriori estimation, for a state-of-the-art probabilistic model used in dependency grammar induction.
Statistical Dependency Analysis with Support Vector Machines
Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that the parser uses no information from phrase structures.
An Application of the Variational Bayesian Approach to Probabilistic Context-Free Grammars
An efficient learning algorithm for probabilistic context-free grammars based on the variational Bayesian approach is presented and it is shown that the computational complexity of the algorithm is equal to that of the Inside-Outside algorithm.