Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
mixup: Beyond Empirical Risk Minimization
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Convolutional Sequence to Sequence Learning
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann Dauphin
- Computer ScienceICML
- 8 May 2017
This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.
Language Modeling with Gated Convolutional Networks
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Theano: A Python framework for fast computation of mathematical expressions
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Hierarchical Neural Story Generation
This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- Yann Dauphin, Razvan Pascanu, Çaglar Gülçehre, Kyunghyun Cho, S. Ganguli, Yoshua Bengio
- Computer ScienceNIPS
- 10 June 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.
Parseval Networks: Improving Robustness to Adversarial Examples
- Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier
- Computer ScienceICML
- 28 April 2017
It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples.
Pay Less Attention with Lightweight and Dynamic Convolutions
It is shown that a very lightweight convolution can perform competitively to the best reported self-attention results, and dynamic convolutions are introduced which are simpler and more efficient than self-ATTention.
Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding
- G. Mesnil, Yann Dauphin, G. Zweig
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and…
- 1 March 2015
This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.
Deal or No Deal? End-to-End Learning of Negotiation Dialogues
For the first time, it is shown it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states, and this technique dramatically improves performance.