Sparsity and smoothness via the fused lasso

  title={Sparsity and smoothness via the fused lasso},
  author={Robert Tibshirani and Michael A. Saunders and Saharon Rosset and Ji Zhu and Keith Knight},
  journal={Journal of The Royal Statistical Society Series B-statistical Methodology},
Summary. The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the ‘fused lasso’, a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also… Expand
The Smooth-Lasso and other l 1 + l 2-penalized methods
We consider a linear regression problem in a high dimensional setting where the number of covariates p can be much larger than the sample size n. In such a situation, one often assumes sparsity ofExpand
Localized Lasso for High-Dimensional Regression
The localized Lasso is introduced, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality and small sample size, and a simple yet efficient iterative least-squares based optimization procedure is proposed. Expand
Fused Lasso Screening Rules via the Monotonicity of Subdifferentials
Novel screening rules that are able to quickly identity the adjacent features with the same coefficients for fused Lasso are proposed that can be significantly reduced, leading to substantial savings in computational cost and memory usage. Expand
A framework to efficiently smooth L1 penalties for linear regression
A unified framework to compute closed-form smooth surrogates of a whole class of L1 penalized regression problems using Nesterov smoothing, which proves that the estimates obtained can be made arbitrarily close to the ones of the original (unsmoothed) objective functions, and provides explicitly computable a priori error bounds on the accuracy of the estimates. Expand
Regularization with the Smooth-Lasso procedure
We consider the linear regression problem. We propose the S-Lasso procedure to estimate the unknown regression parameters. This estimator enjoys sparsity of the representation while taking intoExpand
A Path Algorithm for the Fused Lasso Signal Approximator
The Lasso is a very well-known penalized regression model, which adds an L1 penalty with parameter λ1 on the coefficients to the squared error loss function. The Fused Lasso extends this model byExpand
Bayesian generalized fused lasso modeling via NEG distribution
Abstract The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose aExpand
Split Bregman method for large scale fused Lasso
This paper proposes an iterative algorithm based on the split Bregman method to solve a class of large-scale fused Lasso problems, including a generalized fusedLasso and a fused Lizza support vector classifier, and derives its algorithm using an augmented Lagrangian method and proves its convergence properties. Expand
An efficient algorithm for a class of fused lasso problems
This paper proposes an Efficient Fused Lasso Algorithm (EFLA) and designs a restart technique to accelerate the convergence of SFA, by exploiting the special "structures" of both the original and the reformulated FLSA problems. Expand
Efficient Sparse Semismooth Newton Methods for the Clustered Lasso Problem
The experiments show that the {\sc Ssnal} algorithm substantially outperforms the best alternative algorithm for the clustered lasso problem and an efficient procedure for its computation is derived. Expand


Regression Shrinkage and Selection via the Lasso
SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than aExpand
Boosting as a Regularized Path to a Maximum Margin Classifier
It is built on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector, and shows that as the constraint is relaxed the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. Expand
Asymptotics for lasso-type estimators
We consider the asymptotic behavior of regression estimators that minimize the residual sum of squares plus a penalty proportional to Σ ∥β j ∥γ for some y > 0. These estimators include the Lasso as aExpand
Atomic Decomposition by Basis Pursuit
Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. Expand
Ridge Regression: Biased Estimation for Nonorthogonal Problems
The ridge trace is introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality, and how to augment X′X to obtain biased estimates with smaller mean square error. Expand
Ideal spatial adaptation by wavelet shrinkage
SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline,Expand
A training algorithm for optimal margin classifiers
A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions,Expand
Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach
The NIPALS approach is applied to the ‘soft’ type of model that has come to the fore in sociology and other social sciences in the last five or ten years, namely path models that involve latentExpand
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
In the words of the authors, the goal of this book was to “bring together many of the important new ideas in learning, and explain them in a statistical framework.” The authors have been quiteExpand
1-norm Support Vector Machines
The standard 2-norm SVM is known for its good performance in two-class classification. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over theExpand