# Reexamining Low Rank Matrix Factorization for Trace Norm Regularization

@article{Ciliberto2022ReexaminingLR, title={Reexamining Low Rank Matrix Factorization for Trace Norm Regularization}, author={Carlo Ciliberto and Dimitris Stamos and Massimiliano Pontil}, journal={ArXiv}, year={2022}, volume={abs/1706.08934} }

Trace norm regularization is a widely used approach for learning low rank matrices. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. In practice this approach works well, and it is often computationally faster than standard convex solvers such as proximal gradient methods. Nevertheless, it is not guaranteed to converge to a global optimum, and the optimization can be trapped at poor…

## 13 Citations

### Online Schatten quasi-norm minimization for robust principal component analysis

- Computer ScienceInf. Sci.
- 2019

### Learning Fair and Transferable Representations with Theoretical Guarantees

- Computer Science2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)
- 2020

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multi-task learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

### On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

- Computer ScienceICML
- 2022

Under a simplified unconstrained feature model, this work provides the first global landscape analysis for vanilla nonconvex MSE loss and shows that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

### FACTORISABLE MULTITASK QUANTILE REGRESSION

- Computer ScienceEconometric Theory
- 2020

A multivariate quantile regression model with a factor structure is proposed to study data with multivariate responses with covariates and a nonasymptotic bound on the Frobenius risk and prediction risk is established.

### Scalable Incremental Nonconvex Optimization Approach for Phase Retrieval

- Computer ScienceJ. Sci. Comput.
- 2021

Extensive numerical tests show that the proposed convex relaxation semidefinite programming (SDP) approach outperforms other state-of-the-art methods in the sharpest phase transition of perfect recovery for Gaussian model and the best reconstruction quality for other non-Gaussian models, in particular Fourier phase retrieval.

### Learning Fair and Transferable Representations

- Computer ScienceArXiv
- 2019

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

### M L ] 2 5 Ju n 20 19 Learning Fair and Transferable Representations

- Computer Science
- 2019

This work argues that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting and derives learning bounds establishing that the learned representation transfers well to novel tasks both in terms of prediction performance and fairness metrics.

### A Geometric Analysis of Neural Collapse with Unconstrained Features

- Computer ScienceNeurIPS
- 2021

It is shown that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

### Trace norm regularization and faster inference for embedded speech recognition RNNs

- Computer ScienceArXiv
- 2017

A trace norm regularization technique for training low rank factored versions of matrix multiplications is introduced and it is shown that this method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models.

### Low Rank Communication for Federated Learning

- Computer ScienceDASFAA
- 2020

Low rank communication Fedlr is proposed to compress whole neural network in clients reporting phase and two measures are introduced to make up accuracy loss caused by truncation: training low rank parameter matrix and using iterative averaging.

## References

SHOWING 1-10 OF 27 REFERENCES

### Global Optimality in Tensor Factorization, Deep Learning, and Beyond

- Computer ScienceArXiv
- 2015

This framework derives sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and shows that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm.

### Nuclear Norm Minimization via Active Subspace Selection

- Computer ScienceICML
- 2014

This is the first paper to scale nuclear norm solvers to the Yahoo-Music dataset, and the first time in the literature that the efficiency of nuclear normsolvers can be compared and even compete with non-convex solvers like Alternating Least Squares.

### Lifted coordinate descent for learning with trace-norm regularization

- Computer ScienceAISTATS
- 2012

This work lifts the non-smooth convex problem into an infinitely dimensional smooth problem and applies coordinate descent to solve it, and proves that the approach converges to the optimum, and is competitive or outperforms state of the art.

### Matrix completion and low-rank SVD via fast alternating least squares

- Computer ScienceJ. Mach. Learn. Res.
- 2015

This article develops a software package softlmpute in R for implementing the two approaches for large matrix factorization and completion, and develops a distributed version for very large matrices using the Spark cluster programming environment.

### Matrix Completion has No Spurious Local Minimum

- Computer ScienceNIPS
- 2016

It is proved that the commonly used non-convex objective function for positive semidefinite matrix completion has no spurious local minima --- all local minata must also be global.

### Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality

- MathematicsMath. Oper. Res.
- 2010

A convergent proximal reweighted l1 algorithm for compressive sensing and an application to rank reduction problems is provided, which depends on the geometrical properties of the function L around its critical points.

### Convex multi-task feature learning

- Computer ScienceMachine Learning
- 2007

It is proved that the method for learning sparse representations shared across multiple tasks is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution.

### Fast maximum margin matrix factorization for collaborative prediction

- Computer ScienceICML
- 2005

This work investigates a direct gradient-based optimization method for MMMF and finds that MMMf substantially outperforms all nine methods he tested and demonstrates it on large collaborative prediction problems.

### A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

- Computer ScienceJ. Mach. Learn. Res.
- 2009

This work presents a general approach for collaborative filtering using spectral regularization to learn linear operators mapping a set of "users" to aSet of possibly desired " objects", and provides novel representer theorems that are used to develop new estimation methods.

### Large-scale image classification with trace-norm regularization

- Computer Science2012 IEEE Conference on Computer Vision and Pattern Recognition
- 2012

This work introduces a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial logistic loss and the trace-norm regularization penalty, and proposes a simple and provably efficient accelerated coordinate descent algorithm.