MultiUN: A Multilingual Corpus from United Nation Documents

Abstract

This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. We describe the methods we used for crawling, document formatting, and sentence alignment. This… (More)

Topics

6 Figures and Tables

Statistics

0102030201020112012201320142015201620172018
Citations per Year

134 Citations

Semantic Scholar estimates that this publication has 134 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Eisele2010MultiUNAM, title={MultiUN: A Multilingual Corpus from United Nation Documents}, author={Andreas Eisele and Yu Chen}, booktitle={LREC}, year={2010} }