Learn More
In the words of the authors, the goal of this book was to " bring together many of the important new ideas in learning, and explain them in a statistical framework. " The authors have been quite successful in achieving this objective and their work will be a welcome addition to the statistics and learning literatures. Statistics has always been an(More)
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classiication methodology. The performance of many classiication algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input data, and taking a weighted majority vote of the sequence of classiiers(More)
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include ℓ(1) (the lasso), ℓ(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent,(More)
We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster(More)
We consider " one-at-a-time " coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L 1-penalized regression (lasso) in the lter-ature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show(More)
The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These(More)
Matches An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kN log N. The expected number of records examined in each search is independent of the(More)
Lazy learning algorithms, exempliied by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single \best" decision tree during the training phase, and this tree is then used(More)
We consider the group lasso penalty for the linear model. We note that the standard algorithm for solving the problem assumes that the model matrices in each group are orthonormal. Here we consider a more general penalty that blends the lasso (L 1) with the group lasso (" two-norm "). This penalty yields solutions that are sparse at both the group and(More)