• Publications
  • Influence
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples.
Training generative neural networks via Maximum Mean Discrepancy optimization
This work considers training a deep neural network to generate samples from an unknown distribution given i.i.d. data to frame learning as an optimization minimizing a two-sample test statistic, and proves bounds on the generalization error incurred by optimizing the empirical MMD.
Linear Mode Connectivity and the Lottery Ticket Hypothesis
This work finds that standard vision models become stable to SGD noise in this way early in training, and uses this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy.
Stabilizing the Lottery Ticket Hypothesis
This paper modifications IMP to search for subnetworks that could have been obtained by pruning early in training rather than at iteration 0, and studies subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnets train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise.
A study of the effect of JPG compression on adversarial images
It is found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always, and as the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.
Neural Network Matrix Factorization
This work replaces the inner product of the matrix factorization framework by a multi-layer feed-forward neural network, and learns by alternating between optimizing the network for fixed latent features, and optimizing the latent features for a fixed network.
Pruning Neural Networks at Initialization: Why are We Missing the Mark?
It is shown that, unlike pruning after training, accuracy is the same or higher when randomly shuffling which weights these methods prune within each layer or sampling new initial values, undermining the claimed justifications for these methods and suggesting broader challenges with the underlying pruning heuristics.
Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
This work study the proposal to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual information conditional on the super sample, and introduces yet tighter bounds based on the conditional mutual information.
Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
This work improves upon the stepwise analysis of noisy iterative learning algorithms and significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates via variational characterization of mutual information.