Preparation of Sentiment tagged Parallel Corpus and Testing its effect on Machine Translation

  title={Preparation of Sentiment tagged Parallel Corpus and Testing its effect on Machine Translation},
  author={Sainik Kumar Mahata and Amrita Chandra and Dipankar Das and Sivaji Bandyopadhyay},
In the current work, we explore the enrichment in the machine translation output when the training parallel corpus is augmented with the introduction of sentiment analysis. The paper discusses the preparation of the same sentiment tagged English-Bengali parallel corpus. The preparation of raw parallel corpus, sentiment analysis of the sentences and the training of a Character Based Neural Machine Translation model using the same has been discussed extensively in this paper. The output of the… 



How Sentiment Analysis Can Help Machine Translation

How sentiment analysis can improve the translation quality by incorporating the roles of sentiment components and how a simple baseline phrasebased statistical MT (PB-SMT) system based on the sentiment components can achieve 33.88% relative improvement in BLEU for the under-resourced language pair EnglishBengali.

MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

This work has constructed the traditional MT model using Moses toolkit and has additionally enriched the language model using external data sets and ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

Using SentiWordNet for multilingual sentiment analysis

  • K. Denecke
  • Computer Science
    2008 IEEE 24th International Conference on Data Engineering Workshop
  • 2008
The results show that working with standard technology and existing sentiment analysis approaches is a viable approach to sentiment analysis within a multilingual framework.

The Web as a Parallel Corpus

The use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale are presented.

SMT vs NMT: A Comparison over Hindi & Bengali Simple Sentences

It is observed that NMT outperforms SMT in case of simple sentences whereas SMT outperforms in caseof all types of sentence.

Paraphrasing with Bilingual Parallel Corpora

This work defines a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and shows how it can be refined to take contextual information into account.

A Character-level Decoder without Explicit Segmentation for Neural Machine Translation

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental

Parallel strands: a preliminary investigation into mining the Web for bilingual text

A method for automatically finding parallel translated documents on the Web is presented, conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.

Mining the Web for Bilingual Text

The preliminary STRAND results are extended by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance.

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.