Corpus ID: 44255330

Automatic Bilingual Corpus Collection from Wikipedia

@inproceedings{Unitt2016AutomaticBC,
  title={Automatic Bilingual Corpus Collection from Wikipedia},
  author={Mark. Unitt and Simon. Tite and Pejman Saeghe},
  year={2016}
}
  • Mark. Unitt, Simon. Tite, Pejman Saeghe
  • Published 2016
  • This is a study to combine a number of existing tec hnologies with newly developed tools to create an automatic tool to assist with corpus collection for machine translation. This study aims to combine technologies for domain classification, domain sour ce identification, and comparable file alignment in to a unified tool. The unified tool will be used to ma ke the corpora collection process more focused and efficient and enable a wider variety of sources to be used. 

    Figures from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-4 OF 4 REFERENCES
    Building Bilingual Parallel Corpora Based on Wikipedia
    • 55
    BootCaT: Bootstrapping Corpora and Terms from the Web
    • 362
    • Open Access
    Topic Modeling
    • 14