Corpus ID: 26124282

The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages

@article{Steinberger2006TheJA,
  title={The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages},
  author={R. Steinberger and B. Pouliquen and Anna Widiger and C. Ignat and T. Erjavec and D. Tufis and D{\'a}niel Varga},
  journal={ArXiv},
  year={2006},
  volume={abs/cs/0609058}
}
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional documents being available in the languages of the EU candidate countries. The corpus consists of almost 8,000 documents per language, with an average size of nearly 9 million words per language. Pair-wise paragraph alignment information produced by two different aligners (Vanilla and HunAlign) is available… Expand
601 Citations
MultiUN v2: UN Documents with Multilingual Alignments
  • 16
  • PDF
The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
  • 4
  • PDF
Parallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia
  • 15
  • PDF
Belgisch Staatsblad Corpus: Retrieving French-Dutch Sentences from Official Documents
  • 8
  • PDF
Language-Independent Methods for Identifying Cross-Lingual Similarity in Wikipedia
  • PDF
The Parallel-TUT: a multilingual and multiformat treebank
  • 10
  • PDF
Building and Annotating the Linguistically Diverse NTU-MC (NTU - Multilingual Corpus)
  • 36
  • PDF
Tuomas Talvensaari Comparable Corpora in Cross-Language Information Retrieval
  • 1
  • Highly Influenced
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
A Program for Aligning Sentences in Bilingual Corpora
  • 1,273
  • PDF
Bilingual Machine-Aided Indexing
  • 5
  • PDF
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
  • 221
  • PDF
Europarl: A Parallel Corpus for Statistical Machine Translation
  • 3,119
  • PDF
Generation from parallel texts with endogenous resources
  • Emmanuel Giguet Pierre-Sylvain, E. Giguet
  • 2005
  • 2
  • PDF
Fast and Accurate Sentence Alignment of Bilingual Corpora
  • 313
  • PDF
...
1
2
3
...