Unsupervised Multilingual Grammar Induction

@inproceedings{Snyder2009UnsupervisedMG,
  title={Unsupervised Multilingual Grammar Induction},
  author={Benjamin Snyder and Tahira Naseem and R. Barzilay},
  booktitle={ACL},
  year={2009}
}
We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting… Expand
A Regularization-based Framework for Bilingual Grammar Induction
TLDR
A framework in which the learning process of the grammar model of one language is influenced by knowledge from the model of another language, and three regularization methods that encourage similarity between model parameters, dependency edge scores, and parse trees are proposed. Expand
Selective Sharing for Multilingual Dependency Parsing
We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden theExpand
A Bayesian Model of Multilingual Unsupervised Semantic Role Induction
TLDR
It is found that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language. Expand
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features
We connect two scenarios in structured learning: adapting a parser trained on one corpus to another annotation style, and projecting syntactic annotations from one language to another. We proposeExpand
with Quasi-Synchronous Grammar Features
We connect two scenarios in structured learning: adapting a parser trained on one corpus to another annotation style, and projecting syntactic annotations from one language to another. We proposeExpand
Crosslingual Induction of Semantic Roles
TLDR
This work considers unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and uses a state-of-the-art generative Bayesian non-parametric model to do so. Expand
Bilingually-Guided Monolingual Dependency Grammar Induction
TLDR
This paper induced dependency grammar for five different languages under the guidance of dependency information projected from the parsed English translation, and shows that the bilinguallyguided method achieves a significant improvement over the unsupervised baseline and the best projection baseline on average. Expand
Inducing Sentence Structure from Parallel Corpora for Reordering
TLDR
This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank, showing that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction. Expand
Selective Sharing for Multilingual Dependency Parsing Citation
We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden theExpand
Linguistically motivated models for lightly-supervised dependency parsing
TLDR
This thesis aims at developing parsing models that can effectively perform in a lightly-supervised training regime by formulating linguistically aware models of dependency parsing that can exploit readily available sources of linguistic knowledge such as language universals and typological features. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Unsupervised Multilingual Learning for POS Tagging
TLDR
A hierarchical Bayesian model is formulated for jointly predicting bilingual streams of part-of-speech tags that learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words. Expand
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
  • Dekai Wu
  • Computer Science
  • Comput. Linguistics
  • 1997
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallelExpand
A Generative Constituent-Context Model for Improved Grammar Induction
TLDR
A generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts is presented, giving the best published un-supervised parsing results on the ATIS corpus. Expand
Two Languages are Better than One (for Syntactic Parsing)
TLDR
It is shown that jointly parsing a bitext can substantially improve parse quality on both sides, and the resulting bitext parser outperforms state-of-the-art monolingual parser baselines in a maximum entropy bitext parsing model. Expand
Bootstrapping parsers via syntactic projection across parallel texts
TLDR
Using parallel text to help solving the problem of creating syntactic annotation in more languages by annotating the English side of a parallel corpus, project the analysis to the second language, and train a stochastic analyzer on the resulting noisy annotations. Expand
Bilingual Parsing with Factored Estimation: Using English to Parse Korean
We describe how simple, commonly understood statistical models, such as statistical dependency parsers, probabilistic context-free grammars, and word-to-word translation models, can be effectivelyExpand
Machine Translation with a Stochastic Grammatical Channel
TLDR
A stochastic grammatical channel model for machine translation, that synthesizes several desirable characteristics of both statistical and grammatical machine translation and achieves significant speed gains over the earlier model. Expand
Bayesian Synchronous Grammar Induction
TLDR
A non-parametric Bayesian model is developed and applied to a machine translation task, using priors to replace the various heuristics commonly used in this field. Expand
Unsupervised Multilingual Learning for Morphological Segmentation
TLDR
A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages. Expand
Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction
TLDR
A family of priors over probabilistic grammar weights is presented, called the shared logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilism grammar, providing a new way to encode prior knowledge about an unknown grammar. Expand
...
1
2
3
...