• Publications
  • Influence
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
TLDR
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. Expand
  • 932
  • 75
  • PDF
Dropout Training as Adaptive Regularization
TLDR
We develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer for generalized linear models, and show that it consistently boosts the performance of dropout training. Expand
  • 407
  • 47
  • PDF
Fast dropout training
TLDR
We show how to do fast dropout training for classification, regression, and multilayer neural networks by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization. Expand
  • 320
  • 28
  • PDF
Simple Recurrent Units for Highly Parallelizable Recurrence
TLDR
We propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. Expand
  • 119
  • 20
  • PDF
Data Noising as Smoothing in Neural Network Language Models
TLDR
We derive a connection between input noising in neural network language models and smoothing. Expand
  • 98
  • 11
  • PDF
Learning Language Games through Interaction
TLDR
We introduce a new language learning setting relevant to building adaptive natural language interfaces. Expand
  • 114
  • 6
  • PDF
Altitude Training: Strong Bounds for Single-Layer Dropout
TLDR
We show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Expand
  • 42
  • 5
  • PDF
Naturalizing a Programming Language via Interactive Learning
TLDR
We start with a core language and allow users to "naturalize" the core language incrementally by defining alternative, more natural syntax and increasingly complex concepts in terms of compositions of simpler ones. Expand
  • 42
  • 5
  • PDF
Fast and Adaptive Online Training of Feature-Rich Translation Models
TLDR
We present a fast and scalable online method for tuning statistical machine translation models with large feature sets that is effective yet comparatively easy to implement. Expand
  • 42
  • 3
  • PDF
Feature Noising for Log-Linear Structured Prediction
TLDR
We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. Expand
  • 22
  • 3
  • PDF