Learning variable length units for SMT between related languages via Byte Pair Encoding

@inproceedings{Kunchukuttan2017LearningVL,
  title={Learning variable length units for SMT between related languages via Byte Pair Encoding},
  author={Anoop Kunchukuttan and Pushpak Bhattacharyya},
  booktitle={SWCN@EMNLP},
  year={2017}
}
We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine translation between related languages and compare it with orthographic syllables, which are currently the best performing basic units for this translation task. BPE identifies the most frequent character sequences as basic units, while orthographic syllables are linguistically motivated pseudo-syllables. We show that BPE units outperform orthographic syllables as… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 16 times over the past 90 days. VIEW TWEETS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • We show that BPE units outperform orthographic syllables as units of translation, showing up to 11% increase in BLEU score.
5 Citations
35 References
Similar Papers

Similar Papers

Loading similar papers…