• Corpus ID: 235352609

On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift

  title={On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift},
  author={Maya Ramchandran and Rajarshi Mukherjee},
It has been postulated and observed in practice that for prediction problems in which covariate data can be naturally partitioned into clusters, ensembling algorithms based on suitably aggregating models trained on individual clusters often perform substantially better than methods that ignore the clustering structure in the data. In this paper, we provide theoretical support to these empirical observations by asymptotically analyzing linear least squares and random forest regressions under a… 
1 Citations

Figures and Tables from this paper

Cross-Cluster Weighted Forests
It is found that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm.


Analysis of a Random Forests Model
  • G. Biau
  • Computer Science
    J. Mach. Learn. Res.
  • 2012
An in-depth analysis of a random forests model suggested by Breiman (2004), which is very close to the original algorithm, and shows in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.
Classification and Regression by randomForest
random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
A framework for simultaneous co-clustering and learning from complex data
A model-based co-clustering (meta)-algorithm that interleaves clustering and construction of prediction models to iteratively improve both cluster assignment and fit of the models is presented.
Prediction models for clustered data: comparison of a random intercept and standard regression model
The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions, and the prediction model withrandom intercept had good calibration within clusters.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Spectral analysis of the Gram matrix of mixture models
This text is devoted to the asymptotic study of some spectral properties of the Gram matrix $W^{\sf T} W$ built upon a collection $w_1, \ldots, w_n\in \mathbb{R}^p$ of random vectors (the columns of
Stacked regressions
Stacking regressions is a method for forming linear combinations of different predictors to give improved prediction accuracy. The idea is to use cross-validation data and least squares under
On the principal components of sample covariance matrices
We introduce a class of $$M \times M$$M×M sample covariance matrices $${\mathcal {Q}}$$Q which subsumes and generalizes several previous models. The associated population covariance matrix $$\Sigma =
The limiting spectral distribution of large sample covariance matrices is derived under dependence conditions. As applications, we obtain the limiting spectral distributions of Spearman's rank
Spectral Analysis of Large Dimensional Random Matrices
Wigner Matrices and Semicircular Law.- Sample Covariance Matrices and the Mar#x010D enko-Pastur Law.- Product of Two Random Matrices.- Limits of Extreme Eigenvalues.- Spectrum Separation.-