Automatic Bilingual Corpus Collection from Wikipedia

  • Mark. Unitt, Simon. Tite, Pejman. Saeghe
  • Published 2016

Abstract

This is a study to combine a number of existing technologies with newly developed tools to create an automatic tool to assist with corpus collection for machine translation. This study aims to combine technologies for domain classification, domain source identification, and comparable file alignment into a unified tool. The unified tool will be used to make… (More)

5 Figures and Tables

Topics

  • Presentations referencing similar topics