Learn More
Stability selection was recently introduced by Meinshausen and Bühlmann as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called complementary pairs stability selection, and derive(More)
Finding interactions between variables in large and high-dimensional data sets is often a serious computational challenge. Most approaches build up interaction sets incremen-tally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose an alternative approach for(More)
Spheres are widely used as the basis for the design of multiparticulate drug delivery systems. Although the extrusion and spheronization processes are frequently used to produce such spheres, there is a lack of basic understanding of these processes and of the requisite properties of excipients and formulations. It is hypothesized that the rheological or(More)
PURPOSE Current diagnostic tests for diffuse large B-cell lymphoma use the updated WHO criteria based on biologic, morphologic, and clinical heterogeneity. We propose a refined classification system based on subset-specific B-cell-associated gene signatures (BAGS) in the normal B-cell hierarchy, hypothesizing that it can provide new biologic insight and(More)
We study the problem of high-dimensional regression when there may be interacting variables. Approaches using sparsity-inducing penalty functions such as the Lasso can be useful for producing interpretable models. However, when the number variables runs into the thousands, and so even two-way interactions number in the millions, these methods may become(More)
We would like to begin by congratulating the authors on their fine paper. Handling highly correlated variables is one of the most important issues facing practitioners in high-dimensional regression problems, and in some ways it is surprising that it has not received more attention up to this point. The authors have made substantial progress towards(More)
Large-scale regression problems where both the number of variables, p, and the number of observations, n, may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a fraction of a percent of the entries in the design matrix are non-zero. Nevertheless, often the only computationally feasible(More)
We study large-scale regression analysis where both the number of variables, p, and the number of observations, n, may be large and in the order of millions or more. This is very different from the now well-studied high-dimensional regression context of " large p, small n ". For example, in our " large p, large n " setting, an ordinary least squares(More)
How would you try to solve a linear system of equations with more unknowns than equations? Of course, there are infinitely many solutions, and yet this is the sort of the problem statisticians face with many modern datasets, arising in genetics, imaging, finance and many other fields. What’s worse, our equations are often corrupted by noisy measurements! In(More)
When performing regression on a dataset with p variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least O(p2) if done naively. This cost can be(More)