• Publications
  • Influence
Equality of Opportunity in Supervised Learning
This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition. Expand
Pegasos: primal estimated sub-gradient solver for SVM
A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods. Expand
Maximum-Margin Matrix Factorization
A novel approach to collaborative prediction is presented, using low-norm instead of low-rank factorizations, inspired by, and has strong connections to, large-margin linear discrimination. Expand
Fast maximum margin matrix factorization for collaborative prediction
This work investigates a direct gradient-based optimization method for MMMF and finds that MMMf substantially outperforms all nine methods he tested and demonstrates it on large collaborative prediction problems. Expand
The Marginal Value of Adaptive Gradient Methods in Machine Learning
It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks. Expand
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of theExpand
Weighted Low-Rank Approximations
This work provides a simple and efficient algorithm for solving weighted low-rank approximation problems, which, unlike their unweighted version, do not admit a closed-form solution in general. Expand
Exploring Generalization in Deep Learning
This work considers several recently suggested explanations for what drives generalization in deep networks, including norm-based control, sharpness and robustness, and investigates how these measures explain different observed phenomena. Expand
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size. Expand
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A generalization bound for feedforward neural networks is presented in terms of the product of the spectral norm of the layers and the Frobeniusnorm of the weights using a PAC-Bayes analysis. Expand