Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model


Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages. Parallel web pages tend to have parallel structures,and the structural correspondence can be indicative information for identifying parallel sentences. In our approach, the web… (More)


