Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories


We present a new collection of multilingual corpora automatically created from the content available in the Global Voices websites, where volunteers have been posting and translating citizen media stories since 2004. We describe how we crawled and processed this content to generate parallel resources comprising 302.6K document pairs and 8.36M segment… (More)


10 Figures and Tables