mixup: Beyond Empirical Risk Minimization
- Hongyi Zhang, Moustapha Cissé, Y. Dauphin, David Lopez-Paz
- Computer ScienceInternational Conference on Learning…
- 25 October 2017
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Convolutional Sequence to Sequence Learning
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Y. Dauphin
- Computer ScienceInternational Conference on Machine Learning
- 8 May 2017
This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.
Language Modeling with Gated Convolutional Networks
- Y. Dauphin, Angela Fan, Michael Auli, David Grangier
- Computer ScienceInternational Conference on Machine Learning
- 23 December 2016
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Hierarchical Neural Story Generation
- Angela Fan, M. Lewis, Y. Dauphin
- Computer ScienceAnnual Meeting of the Association for…
- 1 May 2018
This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.
Theano: A Python framework for fast computation of mathematical expressions
- Rami Al-Rfou, Guillaume Alain, Ying Zhang
- Computer ScienceArXiv
- 9 May 2016
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Pay Less Attention with Lightweight and Dynamic Convolutions
- Felix Wu, Angela Fan, Alexei Baevski, Y. Dauphin, Michael Auli
- Computer ScienceInternational Conference on Learning…
- 29 January 2019
It is shown that a very lightweight convolution can perform competitively to the best reported self-attention results, and dynamic convolutions are introduced which are simpler and more efficient than self-ATTention.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- Y. Dauphin, Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, S. Ganguli, Yoshua Bengio
- Computer ScienceNIPS
- 10 June 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.
Parseval Networks: Improving Robustness to Adversarial Examples
- Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Y. Dauphin, Nicolas Usunier
- Computer ScienceInternational Conference on Machine Learning
- 28 April 2017
It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples.
Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding
- Grégoire Mesnil, Y. Dauphin, G. Zweig
- Computer ScienceIEEE/ACM Transactions on Audio Speech and…
- 1 March 2015
This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.
Deal or No Deal? End-to-End Learning of Negotiation Dialogues
- M. Lewis, Denis Yarats, Y. Dauphin, Devi Parikh, Dhruv Batra
- Computer ScienceConference on Empirical Methods in Natural…
- 1 June 2017
For the first time, it is shown it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states, and this technique dramatically improves performance.
...
...