# c-lasso - a Python package for constrained sparse and robust regression and classification

@article{Simpson2021classoA, title={c-lasso - a Python package for constrained sparse and robust regression and classification}, author={L{\'e}o Simpson and Patrick L. Combettes and Christian L. M{\"u}ller}, journal={J. Open Source Softw.}, year={2021}, volume={6}, pages={2844} }

We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: \[ y = X \beta + \sigma \epsilon \qquad \textrm{subject to} \qquad C\beta=0 \] Here, $X \in \mathbb{R}^{n\times d}$is a given design matrix and the vector $y \in \mathbb{R}^{n}$ is a continuous or binary response vector. The matrix $C$ is a general constraint matrix. The…

## 5 Citations

Supervised Learning and Model Analysis with Compositional Data

- Computer Science
- 2022

KernelBiome is a kernel-based nonparametric regression and classiﬁcation framework for compositional data that captures complex signals, including in the zero-structure, while automatically adapting model complexity and is able to incorporate prior knowledge, such as phylogenetic structure.

Bayesian Knockoff Generators for Robust Inference Under Complex Data Structure

- Computer Science
- 2021

This work proposes Bayesian models for generating high quality knockoff copies that utilize available knowledge about the data structure, thus improving the resolution of prognostic features.

CR-Sparse: Hardware accelerated functional algorithms for sparse signal processing in Python using JAX

- Computer ScienceJ. Open Source Softw.
- 2021

We introduce CR-Sparse, a Python library that enables to efficiently solve a wide variety of sparse representation based signal processing problems. It is a cohesive collection of sublibraries…

A causal view on compositional data

- Computer ScienceArXiv
- 2021

This work provides a causal view on compositional data in an instrumental variable setting where the composition acts as the cause and advocates for multivariate alternatives using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account.

Tree-aggregated predictive modeling of microbiome data

- Biology, Computer SciencebioRxiv
- 2020

A data-driven, parameter-free, and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest and posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbial ecologists gain insights into the structure and functioning of the underlying ecosystem of interest.

## References

SHOWING 1-10 OF 22 REFERENCES

Algorithms for Fitting the Constrained Lasso

- Computer ScienceJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2018

This work employs the alternating direction method of multipliers (ADMM) and also derive an efficient solution path algorithm for solving the constrained lasso problem, and shows that, for an arbitrary penalty matrix, the generalized lasso can be transformed to a constrainedLasso, while the converse is not true.

Piecewise linear regularized solution paths

- Mathematics
- 2007

We consider the generic regularized optimization problem β(λ) = argminβ L(y, Xβ) + λJ(β). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407-499] have shown that for the LASSO-that…

Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

- Computer Science
- 2019

A general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases is proposed and a proximal algorithm is introduced that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition

- Computer ScienceSpringer Series in Statistics
- 2001

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering.

Perspective maximum likelihood-type estimation via proximal decomposition

- Computer Science, Mathematics
- 2018

We introduce an optimization model for maximum likelihood-type estimation (M-estimation) that generalizes a large class of existing statistical models, including Huber's concomitant M-estimator,…

Penalized and Constrained Optimization: An Application to High-Dimensional Website Advertising

- Computer ScienceJournal of the American Statistical Association
- 2019

The Penalized and Constrained optimization method (PaC) is developed to compute the solution path for high-dimensional, linearly constrained criteria and is applied to a proprietary dataset in an exemplar Internet advertising case study and demonstrates its superiority over existing methods in this practical setting.

Variable selection in regression with compositional covariates

- Computer Science
- 2014

An l1 regularization method for the linear log-contrast model that respects the unique features of compositional data is proposed and its usefulness is illustrated by an application to a microbiome study relating human body mass index to gut microbiome composition.

Regression Analysis for Microbiome Compositional Data

- Mathematics
- 2016

One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa…

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

- Computer Science
- 2004

This book is a valuable resource, both for the statistician needing an introduction to machine learning and related elds and for the computer scientist wishing to learn more about statistics, and statisticians will especially appreciate that it is written in their own language.