Reexamining Low Rank Matrix Factorization for Trace Norm Regularization

@article{Ciliberto2022ReexaminingLR,
  title={Reexamining Low Rank Matrix Factorization for Trace Norm Regularization},
  author={Carlo Ciliberto and Dimitris Stamos and Massimiliano Pontil},
  journal={ArXiv},
  year={2022},
  volume={abs/1706.08934}
}
Trace norm regularization is a widely used approach for learning low rank matrices. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. In practice this approach works well, and it is often computationally faster than standard convex solvers such as proximal gradient methods. Nevertheless, it is not guaranteed to converge to a global optimum, and the optimization can be trapped at poor… 

Figures and Tables from this paper

Learning Fair and Transferable Representations with Theoretical Guarantees

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multi-task learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

Under a simplified unconstrained feature model, this work provides the first global landscape analysis for vanilla nonconvex MSE loss and shows that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

FACTORISABLE MULTITASK QUANTILE REGRESSION

A multivariate quantile regression model with a factor structure is proposed to study data with multivariate responses with covariates and a nonasymptotic bound on the Frobenius risk and prediction risk is established.

Scalable Incremental Nonconvex Optimization Approach for Phase Retrieval

Extensive numerical tests show that the proposed convex relaxation semidefinite programming (SDP) approach outperforms other state-of-the-art methods in the sharpest phase transition of perfect recovery for Gaussian model and the best reconstruction quality for other non-Gaussian models, in particular Fourier phase retrieval.

Learning Fair and Transferable Representations

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

M L ] 2 5 Ju n 20 19 Learning Fair and Transferable Representations

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

A Geometric Analysis of Neural Collapse with Unconstrained Features

It is shown that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

Trace norm regularization and faster inference for embedded speech recognition RNNs

A trace norm regularization technique for training low rank factored versions of matrix multiplications is introduced and it is shown that this method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models.

Low Rank Communication for Federated Learning

Low rank communication Fedlr is proposed to compress whole neural network in clients reporting phase and two measures are introduced to make up accuracy loss caused by truncation: training low rank parameter matrix and using iterative averaging.

References

SHOWING 1-10 OF 27 REFERENCES

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

This framework derives sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and shows that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm.

Nuclear Norm Minimization via Active Subspace Selection

This is the first paper to scale nuclear norm solvers to the Yahoo-Music dataset, and the first time in the literature that the efficiency of nuclear normsolvers can be compared and even compete with non-convex solvers like Alternating Least Squares.

Lifted coordinate descent for learning with trace-norm regularization

This work lifts the non-smooth convex problem into an infinitely dimensional smooth problem and applies coordinate descent to solve it, and proves that the approach converges to the optimum, and is competitive or outperforms state of the art.

Matrix completion and low-rank SVD via fast alternating least squares

This article develops a software package softlmpute in R for implementing the two approaches for large matrix factorization and completion, and develops a distributed version for very large matrices using the Spark cluster programming environment.

Matrix Completion has No Spurious Local Minimum

It is proved that the commonly used non-convex objective function for positive semidefinite matrix completion has no spurious local minima --- all local minata must also be global.

Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality

A convergent proximal reweighted l1 algorithm for compressive sensing and an application to rank reduction problems is provided, which depends on the geometrical properties of the function L around its critical points.

Convex multi-task feature learning

It is proved that the method for learning sparse representations shared across multiple tasks is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution.

Fast maximum margin matrix factorization for collaborative prediction

This work investigates a direct gradient-based optimization method for MMMF and finds that MMMf substantially outperforms all nine methods he tested and demonstrates it on large collaborative prediction problems.

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

This work presents a general approach for collaborative filtering using spectral regularization to learn linear operators mapping a set of "users" to aSet of possibly desired " objects", and provides novel representer theorems that are used to develop new estimation methods.

Large-scale image classification with trace-norm regularization

This work introduces a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial logistic loss and the trace-norm regularization penalty, and proposes a simple and provably efficient accelerated coordinate descent algorithm.