Europarl: A Parallel Corpus for Statistical Machine Translation

Abstract

We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.

Extracted Key Phrases

8 Figures and Tables

0100200'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

2,361 Citations

Semantic Scholar estimates that this publication has 2,361 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Koehn2005EuroparlAP, title={Europarl: A Parallel Corpus for Statistical Machine Translation}, author={Philipp Koehn}, year={2005} }