Unsupervised Tree Induction for Tree-based Translation

  title={Unsupervised Tree Induction for Tree-based Translation},
  author={Feifei Zhai and Jiajun Zhang and Yu Zhou and Chengqing Zong},
  journal={Transactions of the Association for Computational Linguistics},
In current research, most tree-based translation models are built directly from parse trees. In this study, we go in another direction and build a translation model with an unsupervised tree structure derived from a novel non-parametric Bayesian model. In the model, we utilize synchronous tree substitution grammars (STSG) to capture the bilingual mapping between language pairs. To train the model efficiently, we develop a Gibbs sampler with three novel Gibbs operators. The sampler is capable of… Expand
Bilingually Induced Clause Parser for Tree-based Translation
Tree-based machine translation models possess the property of long distance reordering by incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext. However,Expand
Learning Tree Languages
This chapter surveys known results in this direction of Grammatical Inference algorithms developed for the string case and suggests a number of directions for future research. Expand
RNN-based Derivation Structure Prediction for SMT
Final experimental results show that the proposed DSP model for SMT can significantly improve the translation quality. Expand
Modeling Monolingual Character Alignment for Automatic Evaluation of Chinese Translation
It is shown that it is important to allow different characters to match in the evaluation of Chinese translations and that the IHMM is a reasonable approach for the alignment of Chinese characters. Expand


Improving Tree-to-Tree Translation with Packed Forests
This work proposes a forest-based tree-to-tree model that uses packed forests based on a probabilistic synchronous tree substitution grammar (STSG), which can be learned from aligned forest pairs automatically. Expand
Forest-based Tree Sequence to String Translation Model
A forest-based tree sequence to string translation model for syntax-based statistical machine translation, which automatically learns tree sequenceto string translation rules from word-aligned source-side-parsed bilingual texts, which statistically significantly outperforms the four baseline systems. Expand
Tree-based Translation without using Parse Trees
This paper makes a great effort to bypass the parse trees and induce effective unsupervised trees for treebased translation models and results have shown that the string-to-tree translation system using the unsuper supervised trees significantly outperforms the string to-tree system using parse trees. Expand
A tree-to-tree alignment-based model for statistical machine translation
This paper presents a novel statistical machine translation (SMT) model that uses tree-to-tree alignment between a source parse tree and a target parse tree. The model is formally a probabilisticExpand
A Bayesian Model of Syntax-Directed Tree to String Grammar Induction
A generative Bayesian model of tree-to-string translation which induces grammars that are both smaller and produce better translations than the previous heuristic two-stage approach which employs a separate word alignment step is proposed. Expand
A Tree Sequence Alignment-based Tree-to-Tree Translation Model
A translation model that is based on tree sequence alignment, where a tree sequence refers to a single sequence of subtrees that covers a phrase, that statistically significantly outperforms the baseline systems and supports multi-level structure reordering of tree typology with larger span. Expand
A Discriminative Model for Tree-to-Tree Translation
A statistical, tree-to-tree model for producing translations with use of a discriminative, feature-based model for prediction of target-language syntactic structures---which the authors call aligned extended projections, or AEPs. Expand
Tree-to-String Alignment Template for Statistical Machine Translation
A novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string that significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models. Expand
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
An efficient decoder is described and it is shown that using these tree-based models in combination with conventional SMT models provides a promising approach that incorporates the power of phrasal SMT with the linguistic generality available in a parser. Expand
Inducing Sentence Structure from Parallel Corpora for Reordering
This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank, showing that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction. Expand