This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.Expand

A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.Expand

A novel approach to collaborative prediction is presented, using low-norm instead of low-rank factorizations, inspired by, and has strong connections to, large-margin linear discrimination.Expand

This work investigates a direct gradient-based optimization method for MMMF and finds that MMMf substantially outperforms all nine methods he tested and demonstrates it on large collaborative prediction problems.Expand

It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks.Expand

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the… Expand

This work provides a simple and efficient algorithm for solving weighted low-rank approximation problems, which, unlike their unweighted version, do not admit a closed-form solution in general.Expand

This work considers several recently suggested explanations for what drives generalization in deep networks, including norm-based control, sharpness and robustness, and investigates how these measures explain different observed phenomena.Expand

A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size.Expand

A generalization bound for feedforward neural networks is presented in terms of the product of the spectral norm of the layers and the Frobeniusnorm of the weights using a PAC-Bayes analysis.Expand