ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
- Computer ScienceInternational Conference on Learning…
- 23 March 2020
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.
Unsupervised Data Augmentation for Consistency Training
- Qizhe Xie, Zihang Dai, E. Hovy, Minh-Thang Luong, Quoc V. Le
- Computer ScienceNeural Information Processing Systems
- 29 April 2019
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Self-Training With Noisy Student Improves ImageNet Classification
- Qizhe Xie, E. Hovy, Minh-Thang Luong, Quoc V. Le
- Computer ScienceComputer Vision and Pattern Recognition
- 11 November 2019
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On…
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
- A. Yu, David Dohan, Quoc V. Le
- Computer ScienceInternational Conference on Learning…
- 15 February 2018
A new Q\&A architecture called QANet is proposed, which does not require recurrent networks, and its encoder consists exclusively of convolution and self-attention, where convolution models local interactions andSelf-att attention models global interactions.
Towards a Human-like Open-Domain Chatbot
- Daniel De Freitas, Minh-Thang Luong, Quoc V. Le
- Computer ScienceArXiv
- 27 January 2020
Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations, is presented and a human evaluation metric called Sensibleness and Specificity Average (SSA) is proposed, which captures key elements of a human-like multi- turn conversation.
Multi-task Sequence to Sequence Learning
- Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
- Computer ScienceInternational Conference on Learning…
- 19 November 2015
The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.
A Hierarchical Neural Autoencoder for Paragraphs and Documents
- Jiwei Li, Minh-Thang Luong, Dan Jurafsky
- Computer ScienceAnnual Meeting of the Association for…
- 2 June 2015
This paper introduces an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph and evaluates the reconstructed paragraph using standard metrics to show that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence.
Unsupervised Data Augmentation
- Qizhe Xie, Zihang Dai, E. Hovy, Minh-Thang Luong, Quoc V. Le
- Computer ScienceArXiv
- 29 April 2019
UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small.
Semi-Supervised Sequence Modeling with Cross-View Training
- Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le
- Computer ScienceConference on Empirical Methods in Natural…
- 22 September 2018
Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results.
Massive Exploration of Neural Machine Translation Architectures
- D. Britz, Anna Goldie, Minh-Thang Luong, Quoc V. Le
- Computer ScienceConference on Empirical Methods in Natural…
- 11 March 2017
This work presents a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters, and reports empirical results and variance numbers for several hundred experimental runs corresponding to over 250,000 GPU hours on a WMT English to German translation task.
...
...