The Effects of Targeting Predictors in a Random Forest Regression Model

  title={The Effects of Targeting Predictors in a Random Forest Regression Model},
  author={Daniel D. Borup and Bent Jesper Christensen and Nicolaj Norgaard Muhlbach and Mikkel Slot Nielsen},
The random forest regression (RF) has become an extremely popular tool to analyze high-dimensional data. Nonetheless, it has been argued that its benefits are lessened in sparse high-dimensional settings due to the presence of weak predictors and an initial dimension reduction (targeting) step prior to estimation is required. We show theoretically that, in high-dimensional settings with limited signal, proper targeting is an important complement to RF's feature sampling by controlling the… 
3 Citations

Figures and Tables from this paper

The Macroeconomy as a Random Forest

Over the last decades, an impressive amount of non-linearities have been proposed to reconcile reduced-form macroeconomic models with the data. Many of them boil down to have linear regression

In Search of a Job: Forecasting Employment Growth Using Google Trends

It is shown that Google search activity on relevant terms is a strong out-of-sample predictor for future employment growth in the US over the period 2004-2018 at both short and long horizons, and when the Google Trends panel is exploited using a non-linear model it fully encompasses the macroeconomic forecasts and provides significant information in excess of those.

A New Random Forest Algorithm Based on Learning Automata

A method based on learning automata is presented, through which the adaptive capabilities of the problem space, as well as the independence of the data domain, are added to the random forest to increase its efficiency.



Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

  • Stefan WagerS. Athey
  • Mathematics, Computer Science
    Journal of the American Statistical Association
  • 2018
This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.

Adaptive Concentration of Regression Trees, with Application to Random Forests

This approach breaks tree training into a model selection phase, followed by a model fitting phase where the best regression model consistent with these splits is found, and shows that the fitted regression tree concentrates around the optimal predictor with the same splits.

Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods

It is shown that ML models with a large number of covariates are systematically more accurate than the benchmarks and the ML method that deserves more attention is the random forest model, which dominates all other models.

Sparse Signals in the Cross-Section of Returns

A penalized regression known as the LASSO is used to identify rare, short-lived, “sparse” signals in the cross-section of returns, which boosts out-of-sample predictability in one-minute returns by 23% relative to OLS regressions.

Viewpoint: Boosting Recessions (Prévoir les Recessions)

This paper explores the effectiveness of boosting, often regarded as the state of the art classification tool, in giving warning signals of recessions 3, 6, and 12 months ahead. Boosting is used to

Regression Shrinkage and Selection via the Lasso

A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

Macroeconomic forecast accuracy in a data‐rich environment

The performance of six classes of models in forecasting different types of economic series is evaluated in an extensive pseudo out‐of‐sample exercise. One of these forecasting models, regularized