MultiUN: A Multilingual Corpus from United Nation Documents.

  title={MultiUN: A Multilingual Corpus from United Nation Documents.},
  author={Andreas Eisele and Yu Chen},
  booktitle={LREC 2010},
This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. We describe the methods we used for crawling, document formatting, and sentence alignment. This corpus also includes a common test set for machine translation. We present the results of a French-Chinese machine translation experiment… CONTINUE READING
Highly Cited
This paper has 283 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.

Explore Further: Topics Discussed in This Paper


Publications citing this paper.

283 Citations

Citations per Year
Semantic Scholar estimates that this publication has 283 citations based on the available data.

See our FAQ for additional information.