A Fast and Efficient Framework for Creating Parallel Corpus

  title={A Fast and Efficient Framework for Creating Parallel Corpus},
  author={B. Premjith and S. Kumar and R. Shyam and M. A. Kumar and K. Soman},
  journal={Indian journal of science and technology},
Objectives: A framework involving Scansnap SV600 scanner and Google Optical character recognition (OCR) for creating parallel corpus which is a very essential component of Statistical Machine Translation (SMT). Methods and Analysis: Training a language model for a SMT system highly depends on the availability of a parallel corpus. An efficacious approach for collecting parallel sentences is the predominant step in an MT system. However, the creation of a parallel corpus requires extensive… Expand
OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation
This work provides an extended English-Odia parallel corpus, OdiEnCorp 2.0, aiming particularly at Neural Machine Translation (NMT) systems which will help translate English↔OdIA. Expand
Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus
A neural machine translation system for four language pairs, designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN) and able to perceive long-term contexts in the sentences. Expand
An Overview of the Shared Task on Machine Translation in Indian Languages (MTIL) – 2017
An overview of the Machine Translation in Indian Languages shared task conducted on September 7–8, 2017, is presented, which aims to examine the state-of-the-art machine translation systems when translating from English to Indian languages and to create an open-source parallel corpus for Indian languages, which is lacking. Expand
Verb Phrases Alignment Technique for English-Malayalam Parallel Corpus in Statistical Machine Translation Special issue on MTIL 2017
This paper focuses on a technique that enables automatic setting up of a verb-aligned parallel corpus by exploring the internal structure of the English and Malayalam language, which in turn facilitates the task of machine translation from English toMalayalam. Expand
MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation
This work has constructed the traditional MT model using Moses toolkit and has additionally enriched the language model using external data sets and ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab. Expand
Machine Translation in Indian Languages: Challenges and Resolution
It is demonstrated that the use of preordering and suffix separation helps in improving the quality of English to Indian language machine translation. Expand