Modeling of time series using random forests: theoretical developments

  title={Modeling of time series using random forests: theoretical developments},
  author={Richard A. Davis and Mikkel Slot Nielsen},
In this paper we study asymptotic properties of random forests within the framework of nonlinear time series modeling. While random forests have been successfully applied in various fields, the theoretical justification has not been considered for their use in a time series setting. Under mild conditions, we prove a uniform concentration inequality for regression trees built on nonlinear autoregressive processes and, subsequently, we use this result to prove consistency for a large class of… 

Figures from this paper

Machine Learning Advances for Time Series Forecasting
The most recent advances in supervised machine learning and highdimensional models for time series forecasting are surveyed and ensemble and hybrid models by combining ingredients from different alternatives are considered.
The Effects of Targeting Predictors in a Random Forest Regression Model
It is shown theoretically that, in high-dimensional settings with limited signal, proper targeting is an important complement to RF's feature sampling by controlling the probability of placing splits along strong predictors, and this is supported by simulations with representable finite samples.
Conducting Causal Analysis by Means of Approximating Probabilistic Truths
It is shown that the suggested measure-theoretic approaches do not only lead to better predictive models, but also to more plausible parsimonious descriptions of possible causal flows.
Tree-based synthetic control methods: Consequences of relocating the US embassy
This work recast the synthetic controls for evaluating policies as a counterfactual prediction problem and replaces its linear regression with a nonparametric model inspired by machine learning, and applies this method to a highly debated policy: the relocation of the US embassy to Jerusalem.
Machine learning techniques for forecasting agricultural prices: A case of brinjal in Odisha, India
An attempt has been made to explore efficient ML algorithms e.g. Generalized Neural Network (GRNN), Support Vector Regression (SVR), Random Forest (RF) and Gradient Boosting Machine (GBM) for forecasting wholesale price of Brinjal in seventeen major markets of Odisha, India and it is observed that GRNN performs better in most of the cases.
An approach to building a financial model for the purposes of planning resource products of the corporate segment in commercial banks
Subject. The paper considers planning of resource products in the corporate business segment of a commercial bank. Objectives. The aim is to develop a model for planning financial results generated
Targeting predictors in random forest regression
Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings, due to weak predictors, and a


A Review of Nonparametric Time Series Analysis
Various features of a given time series may be analyzed by nonparametric techniques. Generally the characteristic of interest is allowed to have a general form which is approximated increasingly
Analysis of a Random Forests Model
  • G. Biau
  • Computer Science
    J. Mach. Learn. Res.
  • 2012
An in-depth analysis of a random forests model suggested by Breiman (2004), which is very close to the original algorithm, and shows in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.
Adaptive Concentration of Regression Trees, with Application to Random Forests
This approach breaks tree training into a model selection phase, followed by a model fitting phase where the best regression model consistent with these splits is found, and shows that the fitted regression tree concentrates around the optimal predictor with the same splits.
Quantile Regression Forests
It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean, in order to be competitive in terms of predictive power.
Covariance inequalities for strongly mixing processes
Let X and Y be two real-valued random variables. Let a denote the strong mixing coefficient between the two a-fields generated respectively by X and Y, and Qx (u) = inf {t: P ( ] X I > t) ~ M} be the
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
  • Stefan Wager, S. Athey
  • Mathematics, Computer Science
    Journal of the American Statistical Association
  • 2018
This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.
Mixing: Properties and Examples
Mixing is concerned with the analysis of dependence between sigma-fields defined on the same underlying probability space. It provides an important tool of analysis for random fields, Markov
Consistency of Random Forests
A step forward in forest exploration is taken by proving a consistency result for Breiman's original algorithm in the context of additive regression models, and sheds an interesting light on how random forests can nicely adapt to sparsity.
Random Forests
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Forecasting Stock Index Movement: A Comparison of Support Vector Machines and Random Forest
Empirical experimentation suggests that the SVM outperforms the other classification methods in terms of predicting the direction of the stock market movement and random forest method outperforms neural network, discriminant analysis and logit model used in this study.