The Benefit of Group Sparsity

@article{Huang2009TheBO,
  title={The Benefit of Group Sparsity},
  author={Junzhou Huang and Tong Zhang},
  journal={Annals of Statistics},
  year={2009},
  volume={38},
  pages={1978-2004}
}
This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly group-sparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies. 

Figures from this paper

Group Lasso with Overlaps: the Latent Group Lasso approach
We study a norm for structured sparsity which leads to sparse linear predictors whose supports are unions of prede ned overlapping groups of variables. We call the obtained formulation latent group
The Benefit of Group Sparsity in Group Inference with De-biased Scaled Group Lasso
We study confidence regions and approximate chi-squared tests for variable groups in high-dimensional linear regression. When the size of the group is small, low-dimensional projection estimators for
Error Bounds for Generalized Group Sparsity
TLDR
This work considers a generalized version of Sparse-Group Lasso which captures both element-wise and group-wise sparsity simultaneously and identifies a generalized norm of $\epsilon$-norm, which provides a dual formulation for the double sparsity regularization.
Group Sparse Additive Models
TLDR
A new method, called group sparse additive models (GroupSpAM), which can handle group sparsity in additive models, and derives a novel thresholding condition for identifying the functional sparsity at the group level, and proposes an efficient block coordinate descent algorithm for constructing the estimate.
A doubly sparse approach for group variable selection
We propose a new penalty called the doubly sparse (DS) penalty for variable selection in high-dimensional linear regression models when the covariates are naturally grouped. An advantage of the DS
Characteristics of group LASSO in handling high correlated data
Problems of high correlated data in a linear regression can not be handled directly by standard methods of parameter estimation such as the least squares (LS). Lasso technique is a proper method to
Group sparse RLS algorithms
SUMMARY Group sparsity is one of the important signal priors for regularization of inverse problems. Sparsity with group structure is encountered in numerous applications. However, despite the
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either nonnegative,
Sparse Group Selection Through Co-Adaptive Penalties
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to
Trace regression model with simultaneously low rank and row(column) sparse parameter
TLDR
To estimate the parameter of the trace regression model with matrix covariates, a convex optimization problem with the nuclear norm and group Lasso penalties is formulated, and an alternating direction method of multipliers (ADMM) algorithm is proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
On the asymptotic properties of the group lasso estimator for linear models
We establish estimation and model selection consistency, pre- diction and estimation boundsand persistencefor the group-lassoestimator and model selectorproposed by Yuan and Lin (2006) for least
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk
Consistency of the group Lasso and multiple kernel learning
  • F. Bach
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2008
TLDR
This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Union support recovery in high-dimensional multivariate regression
TLDR
This sparsity-overlap function reveals that, if the design is uncorrelated on the active rows, block lscr1/lscr2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity when the regression vectors are suitably orthogonal.
Taking Advantage of Sparsity in Multi-Task Learning
TLDR
The Group Lasso is considered as a candidate estimation method and it is shown that this estimator enjoys nice sparsity oracle inequalities and variable selection properties and can be extended to more general noise distributions, of which it only requires the variance to be finite.
The sparsity and bias of the Lasso selection in high-dimensional linear regression
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent,
Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits
TLDR
Oracle inequalities on excess risk of such estimators are proved showing that the method is adaptive to unknown degree of “sparsity” of the target function.
Some sharp performance bounds for least squares regression with L1 regularization
We derive sharp performance bounds for least squares regression with L regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for L 1
An Empirical Bayesian Strategy for Solving the Simultaneous Sparse Approximation Problem
  • D. Wipf, B. Rao
  • Computer Science, Mathematics
    IEEE Transactions on Signal Processing
  • 2007
TLDR
Based on the concept of automatic relevance determination, this paper uses an empirical Bayesian prior to estimate a convenient posterior distribution over candidate basis vectors and consistently places its prominent posterior mass on the appropriate region of weight-space necessary for simultaneous sparse recovery.
Model selection and estimation in regression with grouped variables
Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor
...
1
2
...