Automatic Bilingual Corpus Collection from Wikipedia

  • Mark. Unitt, Simon. Tite, Pejman. Saeghe
  • Published 2016


This is a study to combine a number of existing technologies with newly developed tools to create an automatic tool to assist with corpus collection for machine translation. This study aims to combine technologies for domain classification, domain source identification, and comparable file alignment into a unified tool. The unified tool will be used to make… (More)

5 Figures and Tables


  • Presentations referencing similar topics