Learn More
We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regular-izing large fully-connected layers within neu-ral networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropCon-nect instead sets a randomly selected subset of weights within the network to zero. Each unit(More)
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is(More)
We present a hierarchical model that learns image de-compositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely(More)
In order to illuminate a light signaling a correct response, adult humans had to space their button presses according to a range of time requirements. In some conditions, the spacing needed only to exceed a minimum duration; in others, it had to fall between lower and upper bounds. Mean interresponse times always exceeded the lower limit, and decreased the(More)
Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster(More)
The peak procedure was used to study temporal control in pigeons exposed to seven fixed-interval schedules ranging from 7.5 to 480 s. The focus was on behavior in individual intervals. Quantitative properties of temporal control depended on whether the aspect of behavior considered was initial pause duration, the point of maximum acceleration in responding,(More)
We present a type of Temporal Restricted Boltzmann Machine that defines a probability distribution over an output sequence conditional on an input sequence. It shares the desirable properties of RBMs: efficient exact inference, an exponentially more expressive latent state than HMMs, and the ability to model nonlinear structure and dynamics. We apply our(More)