Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations

  title={Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations},
  author={Xiaohan Yan and Jacob Bien},
  journal={Statistical Science},
Demanding sparsity in estimated models has become a routine practice in statistics. In many situations, we wish to require that the sparsity patterns attained honor certain problem-specific constraints. Hierarchical sparse modeling (HSM) refers to situations in which these constraints specify that one set of parameters be set to zero whenever another is set to zero. In recent years, numerous papers have developed convex regularizers for this form of sparsity structure, which arises in… 

Learning Local Dependence In Ordered Data

This work proposes a framework for learning local dependence based on estimating the inverse of the Cholesky factor of the covariance matrix, which yields a simple regression interpretation for local dependence in which variables are predicted by their neighbors.

A First-Order Optimization Algorithm for Statistical Learning with Hierarchical Sparsity Structure

A computationally efficient optimization algorithm to evaluate the proximal operator of a nonsmooth hierarchical sparsity-inducing regularizer and establishes its convergence properties is proposed and performance is demonstrated on two statistical learning applications related to topic modeling and breast cancer classification.

Graph-Guided Banding of the Covariance Matrix

  • J. Bien
  • Mathematics
    Journal of the American Statistical Association
  • 2018
This work proposes a generalization of the notion of bandedness that greatly expands the range of problems in which banded estimators apply, and develops convex regularizers occupying the broad middle ground between the former approach of “patternless sparsity” and the latter reliance on having a known ordering.

A Pliable Lasso

A generalization of the lasso that allows the model coefficients to vary as a function of a general set of some prespecified modifying variables, which might be variables such as gender, age, or time is proposed.

LassoNet: Neural Networks with Feature Sparsity

This work introduces LassoNet, a neural network framework with global feature selection that uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly, and delivers an entire regularization path of solutions with a range of feature sparsity.

Learning With Subquadratic Regularization : A Primal-Dual Approach

This paper describes the efficiency of the algorithms developed in the context of tree-structured sparsity, where they comprehensively outperform relevant baselines and achieve a proven rate of convergence of O(1/T ) after T iterations.

Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

A convex relaxation which enforces strong hierarchy is studied and a highly scalable algorithm based on proximal gradient descent is developed and a specialized active-set strategy with gradient screening for avoiding costly gradient computations is introduced.

Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages

Novel theoretical machinery includes non-asymptotic analysis of infinite-order VAR, elastic net estimation under a singular covariance structure of regressors, and new concentration inequalities for quadratic forms of random variables from Gaussian time series.

Learning with Structured Sparsity: From Discrete to Convex and Back

A new class of structured sparsity models, able to capture a large range of structures, which admit tight convex relaxations amenable to efficient optimization are introduced, and an in-depth study of the geometric and statistical properties of convex Relaxations of general combinatorial structures is presented.




A precise characterization of the effect of this hierarchy constraint is given, a bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint, and it is proved that hierarchy holds with probability one.

A Sparse-Group Lasso

A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.

Proximal methods for the latent group lasso penalty

This paper considers a regularized least squares problem, with regularization by structured sparsity-inducing norms, which extend the usual ℓ1 and the group lasso penalty, by allowing the subsets to overlap, and devise a new active set strategy that allows to deal with high dimensional problems without pre-processing for dimensionality reduction.

Group Lasso with Overlaps: the Latent Group Lasso approach

We study a norm for structured sparsity which leads to sparse linear predictors whose supports are unions of prede ned overlapping groups of variables. We call the obtained formulation latent group

Structured sparsity through convex optimization

It is shown that the $\ell_1$-norm can be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures.

Group Regularized Estimation Under Structural Hierarchy

This work investigates a new class of estimators that make use of multiple group penalties to capture structural parsimony and shows that the proposed estimators enjoy sharp rate oracle inequalities, and give the minimax lower bounds in strong and weak hierarchical variable selection.

Sparse estimation of large covariance matrices via a nested Lasso penalty

The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering, and imposes a banded structure on the Cholesky factor, using a novel penalty called nested Lasso, which results in a sparse estimators for the inverse of the covariance matrix.

The composite absolute penalties family for grouped and hierarchical variable selection

CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $p\gg n$ and possibly mis-specified groupings, and iCAP is seen to be parsimonious in the experiments.

Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions

This work introduces a new approach, “Variable selection using Adaptive Nonlinear Interaction Structures in High dimensions” (VANISH), that is based on a penalized least squares criterion and is designed for high dimensional nonlinear problems and suggests that VANISH should outperform certain natural competitors when the true interaction structure is sufficiently sparse.

Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials

This work uses a spectral projected gradient method as a subroutine for solving the overlapping group ‘1regularization problem, and makes use of a sparse version of Dykstra’s algorithm to compute the projection.