SMT systems for less-resourced languages based on domain-specific data

Abstract

In this paper we show that good SMT systems for less-resourced languages can be obtained by using even small amounts of high quality domain-specific data. We suggest a method to filter newly collected data for parallel corpora, using the internal alignment scores from the aligning process. The filtering process is easy to use and is based on open-source… (More)

Topics

10 Figures and Tables

Cite this paper

@inproceedings{Offersgaard2012SMTSF, title={SMT systems for less-resourced languages based on domain-specific data}, author={Lene Offersgaard and Dorte Haltrup Hansen}, year={2012} }