• Publications
  • Influence
Rectifier Nonlinearities Improve Neural Network Acoustic Models
This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units. Expand
Maximum Entropy Inverse Reinforcement Learning
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand
Learning Word Vectors for Sentiment Analysis
This work presents a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content, and finds it out-performs several previously introduced methods for sentiment classification. Expand
Recurrent Neural Networks for Noise Reduction in Robust ASR
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly. Expand
Lexicon-Free Conversational Speech Recognition with Neural Networks
An approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks. Expand
Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior
We present PROCAB, an efficient method for Probabilistically Reasoning from Observed Context-Aware Behavior. It models the context-dependent utilities and underlying reasons that people takeExpand
First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
This paper demonstrates that a straightforward recurrent neural network architecture can achieve a high level of accuracy and proposes and evaluates a modified prefix-search decoding algorithm that enables first-pass speech recognition with a langu age model, completely unaided by the cumbersome infrastructure of HMM-based systems. Expand
Building DNN acoustic models for large vocabulary speech recognition
An empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance, and suggests that a relatively simple DNN architecture and optimization technique produces strong results. Expand
A Probabilistic Model for Semantic Word Vectors
Vector representations of words capture relationships in words’ functions and meanings. Many existing techniques for inducing such representations from data use a pipeline of hand-coded processingExpand
Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities
A new nonparametric clustering model which combines the recently proposed distance-dependent Chinese restaurant process (dd-CRP) and non-linear, spectral methods for dimensionality reduction and improves the performance of the dd- CRP in spectral space by incorporating the original similarity matrix in its prior. Expand