Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora

Abstract

This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at documentand sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 aligned sentences each. The quality of the corpora and their usefulness are tested in an experiment with machine translation.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Aziz2011FullyAC, title={Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora}, author={Wilker Ferreira Aziz and Lucia Specia}, year={2011} }