Learn More
Current statistical machine translation systems usually extract rules from bilingual corpora annotated with 1-best alignments. They are prone to learn noisy rules due to alignment mistakes. We propose a new structure called weighted alignment matrix to encode all possible alignments for a parallel text compactly. The key idea is to assign a probability to(More)
Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However , SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We(More)
As tokenization is usually ambiguous for many natural languages such as Chinese and Korean, tokenization errors might potentially introduce translation mistakes for translation systems that rely on 1-best to-kenizations. While using lattices to offer more alternatives to translation systems have elegantly alleviated this problem , we take a further step to(More)
Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features , it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also(More)
This paper describes the ICT Statistical Machine Translation systems that used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2009. For this year's evaluation, we participated in the Challenge Task (Chinese-English and English-Chinese) and BTEC Task (Chinese-English). And we mainly focus on one new method to(More)
Traditional synchronous grammar induction estimates parameters by maximizing likelihood , which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discrim-inatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the(More)
We present a global log-linear model for synchronous grammar induction, which is capable of incorporating arbitrary features. The parameters in the model are trained in an unsuper-vised fashion from parallel sentences without word alignments. To make parameter training tractable, we also propose a novel and efficient cube pruning based synchronous parsing(More)
—The hierarchical phrase-based (HPB) translation exploits the power of grammar to perform long distance reorderings, without specifying nonterminal orientations against adjacent blocks or considering the lexical information covered by nonterminals. In this paper, we borrow from phrase-based system the idea of orientation model to enhance the reordering(More)
  • 1