• Corpus ID: 235790709

Using Machine Translation to Localize Task Oriented NLG Output

  title={Using Machine Translation to Localize Task Oriented NLG Output},
  author={Scott Roy and Clifford Brunk and Kyu-Young Kim and Justin Zhao and Markus Freitag and Mihir Kale and Gagan Bansal and Sidharth Mudgal and Chris Varano},
One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The required quality bar is close to perfection, the range of sentences is extremely narrow, and the… 

Figures and Tables from this paper


Fast Domain Adaptation for Neural Machine Translation
This paper proposes an approach for adapting a NMT system to a new domain with the main idea behind domain adaptation that the availability of large out-of-domain training data and a small in- domain training data.
Machine Translation Pre-training for Data-to-Text Generation - A Case Study in Czech
This paper studies the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages and shows that this approach enjoys several desirable properties, including improved performance in low data scenarios and applicability to low resource languages.
A Survey of Domain Adaptation for Neural Machine Translation
A comprehensive survey of the state-of-the-art domain adaptation techniques for NMT is given, which leverages both out- of-domain parallel corpora as well as monolingual corpora for in-domain translation.
Improving Neural Machine Translation Models with Monolingual Data
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
This paper identifies several key modeling and training techniques, and applies them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks.
Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems
A statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure that can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates.
Sequence to Sequence Learning with Neural Networks
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Small and Practical BERT Models for Sequence Labeling
This work proposes a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU, and shows that the model especially outperforms on low-resource languages.
A Multilingual Parallel Corpora Collection Effort for Indian Languages
The methods of constructing sentence aligned parallel corpora using tools enabled by recent advances in machine translation and cross-lingual retrieval using deep neural network based methods are reported on.
The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020)
The results of the generation and semantic parsing task for both English and Russian are presented and a brief description of the participating systems are provided.