MultiUN: A Multilingual Corpus from United Nation Documents


This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. We describe the methods we used for crawling, document formatting, and sentence alignment. This… (More)


6 Figures and Tables


Citations per Year

134 Citations

Semantic Scholar estimates that this publication has 134 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Eisele2010MultiUNAM, title={MultiUN: A Multilingual Corpus from United Nation Documents}, author={Andreas Eisele and Yu Chen}, booktitle={LREC}, year={2010} }