The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages
@article{Steinberger2006TheJA, title={The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages}, author={R. Steinberger and B. Pouliquen and Anna Widiger and C. Ignat and T. Erjavec and D. Tufis and D{\'a}niel Varga}, journal={ArXiv}, year={2006}, volume={abs/cs/0609058} }
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional documents being available in the languages of the EU candidate countries. The corpus consists of almost 8,000 documents per language, with an average size of nearly 9 million words per language. Pair-wise paragraph alignment information produced by two different aligners (Vanilla and HunAlign) is available… Expand
601 Citations
The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
- Computer Science
- 2010
- 4
- PDF
Parallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia
- Computer Science
- Res. Comput. Sci.
- 2013
- 15
- PDF
Belgisch Staatsblad Corpus: Retrieving French-Dutch Sentences from Official Documents
- Computer Science
- LREC
- 2010
- 8
- PDF
Language-Independent Methods for Identifying Cross-Lingual Similarity in Wikipedia
- Computer Science
- 2019
- PDF
Building and Annotating the Linguistically Diverse NTU-MC (NTU - Multilingual Corpus)
- Computer Science
- Int. J. Asian Lang. Process.
- 2011
- 36
- PDF
References
SHOWING 1-10 OF 24 REFERENCES
Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications
- Computer Science
- ArXiv
- 2006
- 41
- PDF
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
- Computer Science
- LREC
- 2004
- 221
- PDF
Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources
- Computer Science
- ACL
- 2006
- 17
- PDF
The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’
- Computer Science
- Comput. Humanit.
- 1999
- 142
Automatic annotation of multilingual text collections with a conceptual thesaurus
- Computer Science
- ArXiv
- 2006
- 101
- PDF