Robust Neural Machine Translation with Joint Textual and Phonetic Embedding

@inproceedings{Liu2019RobustNM,
  title={Robust Neural Machine Translation with Joint Textual and Phonetic Embedding},
  author={Hairong Liu and M. Ma and Liang Huang and Hao Xiong and Zhongjun He},
  booktitle={ACL},
  year={2019}
}
Neural machine translation (NMT) is notoriously sensitive to noises, but noises are almost inevitable in practice. [...] Key Result Experiments show that our method not only significantly improves the robustness of NMT to homophone noise, which is expected but also surprisingly improves the translation quality on clean test sets.Expand
Modeling Homophone Noise for Robust Neural Machine Translation
TLDR
A robust neural machine translation (NMT) framework to deal with homophone errors is proposed and extensive experiments on Chinese→English translation demonstrate that the proposed method not only significantly outperforms baselines on noisy test sets withhomophone noise, but also achieves substantial improvements over them on clean texts. Expand
Semantic Diversity by Phonetics for Accurate and Robust Machine Translation
  • 2019
Neural Machine Translation (NMT) learns from examples, and thus often lacks robustness against noise. Previous work has shown that integrating noise into the training process is effective atExpand
Semantic Diversity by Phonetics : Towards Accurate and Robust Machine Translation
  • 2019
Neural Machine Translation (NMT) learns from examples, and thus often lacks robustness against noise. Previous work has shown that integrating noise into the training process is effective atExpand
Robust Neural Machine Translation with ASR Errors
TLDR
This paper focuses on ASR errors on homophone words and words with similar pronunciation and make use of their pronunciation information to help the translation model to recover from the input errors. Expand
Robust Unsupervised Neural Machine Translation with Adversarial Training
TLDR
This paper defines two types of noises and empirically shows the effect of these noisy data on UNMT performance, and proposes adversarial training methods to improve the robustness of UNMT in the noisy scenario. Expand
Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
TLDR
A training architecture which aims at making a neural machine translation model more robust against speech recognition errors by addressing the encoder and the decoder simultaneously using adversarial learning and data augmentation, respectively is proposed. Expand
Inverted Projection for Robust Speech Translation
TLDR
An inverted projection approach is introduced that projects automatically detected system segments onto human transcripts and then re-segments the gold translations to align with the projected human transcripts to overcome the train-test mismatch present in other training approaches. Expand
Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations
TLDR
This work shows that fine-tuning using naturally occurring noise along with pseudo-references is a promising solution towards systems robust to such type of input variations, and focuses on four translation pairs from English to Spanish, Italian, French, and Portuguese. Expand
Word Shape Matters: Robust Machine Translation with Visual Embedding
TLDR
A new encoding heuristic of the input symbols for character-level NLP models is introduced: it encodes the shape of each character through the images depicting the letters when printed, expected to improve the robustness of N LP models. Expand
Exploring the Robustness of NMT Systems to Nonsensical Inputs
TLDR
A soft-attention based technique is proposed to make the aforementioned word replacements when multiple words in the source sentence have been replaced and achieves high success rate and outperforms existing methods like HotFlip by a significant margin. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 24 REFERENCES
Improving the Robustness of Speech Translation
TLDR
This work simulates the noise existing in the realistic output of the ASR system and inject them into the clean parallel data so that NMT can work under similar word distributions during training and testing and incorporates the Chinese Pinyin feature which is easy to get in speech translation. Expand
Synthetic and Natural Noise Both Break Neural Machine Translation
TLDR
It is found that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise, including structure-invariant word representations and robust training on noisy texts. Expand
Dealing with Input Noise in Statistical Machine Translation
TLDR
This paper shows the experiments done with real-life noisy input and a standard phrase-based SMT system from English into Spanish and a preprocessing step consisting in a character-based translator from noisy into cleaned text. Expand
Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors
TLDR
This work compares the translation of utterances containing ASR errors in state-of-the-art NMT encoder-decoder systems against a strong phrase-based machine translation baseline in order to better understand which phenomena present in ASR outputs are better represented under the NMT framework than approaches that represent translation as a linear model. Expand
Towards Robust Neural Machine Translation
TLDR
Experimental results on Chinese-English, English-German and English-French translation tasks show that the proposed approaches can not only achieve significant improvements over strong NMT systems but also improve the robustness of NMT models. Expand
Improved Neural Machine Translation with Chinese Phonologic Features
TLDR
A novel phonology-aware neural machine translation (PA-NMT) model where Chinese phonologic features are leveraged for translation tasks with Chinese as the target, which significantly outperforms state-of-the-art baselines on these two tasks. Expand
Phonetically-oriented word error alignment for speech recognition error analysis in speech translation
TLDR
The Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many alignments corresponding to homophonic errors in speech recognition hypotheses. Expand
Neural Machine Translation of Rare Words with Subword Units
TLDR
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
Pinyin as Subword Unit for Chinese-Sourced Neural Machine Translation
TLDR
This paper proposes to utilize Pinyin, a romanization system for Chinese characters, to convert Chinese characters to subword units to alleviate the UNK problem and demonstrates that the proposed methods can remarkably improve the translation quality. Expand
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
TLDR
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases. Expand
...
1
2
3
...