• Publications
  • Influence
mixup: Beyond Empirical Risk Minimization
TLDR
We propose mixup, a simple learning principle that improves the generalization of state-of-the-art neural network architectures. Expand
  • 1,134
  • 259
  • PDF
Convolutional Sequence to Sequence Learning
TLDR
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. Expand
  • 1,669
  • 235
  • PDF
Theano: A Python framework for fast computation of mathematical expressions
TLDR
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Expand
  • 1,847
  • 140
  • PDF
Language Modeling with Gated Convolutional Networks
TLDR
In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. Expand
  • 827
  • 133
  • PDF
Hierarchical Neural Story Generation
TLDR
We tackle the challenges of story-telling with a hierarchical model, which first generates a sentence called the prompt describing the topic for the story, and then conditions on this prompt when generating the story. Expand
  • 265
  • 62
  • PDF
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
TLDR
We propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. Expand
  • 839
  • 61
  • PDF
Parseval Networks: Improving Robustness to Adversarial Examples
TLDR
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Expand
  • 388
  • 54
  • PDF
Pay Less Attention with Lightweight and Dynamic Convolutions
TLDR
We introduce dynamic convolutions which are simpler and more efficient than self-attention. Expand
  • 185
  • 44
  • PDF
Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding
TLDR
We propose the use of recurrent neural networks for the SLU slot filling task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Expand
  • 389
  • 38
  • PDF
Deal or No Deal? End-to-End Learning of Negotiation Dialogues
TLDR
We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other's reward functions must reach an agreement (or a deal) via natural language dialogue. Expand
  • 172
  • 29
  • PDF