• Corpus ID: 190000032

(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data

  title={(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data},
  author={Taylor Pospisil and Ann B. Lee},
  journal={arXiv: Computation},
Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not efficiently handle functional data and runs into a curse-of dimensionality when presented with high-resolution curves and surfaces. Furthermore, in settings with heteroskedasticity or multimodality, a regression point estimate with standard errors do not fully… 

Figures and Tables from this paper

Wassertein Random Forests and Applications in Heterogeneous Treatment Effects: Supplementary Materials

  • Computer Science
  • 2021
First, to illustrate the good quality of the estimation provided by WRF, an individual G∗ is randomly selected such that the associated CATE function is 0, for which a CATE-based inference cannot provide sufficient insight in the causality.

Wasserstein Random Forests and Applications in Heterogeneous Treatment Effects

This reformulation of Breiman's original splitting criterion in terms of Wasserstein distances between empirical measures indicates that Random Forests are well adapted to estimate conditional distributions and provides a natural extension of the algorithm to multivariate outputs.

Generative Quantile Regression with Variability Penalty

A deep learning generative model for joint quantile estimation called Penalized Generative Quantile Regression (PGQR), which simultaneously generates samples from many random quantile levels, allowing it to infer the conditional distribution of a response variable given a set of covariates.

Evaluating Aleatoric Uncertainty via Conditional Generative Models

Two metrics are introduced to measure the discrepancy between two conditional distributions that suit these models and can be easily and unbiasedly computed via Monte Carlo simulation of the conditional generative models, thus facilitating their evaluation and training.

CD-split: efficient conformal regions in high dimensions

It is shown that CD-split converges asymptotically to the oracle highest density set and satisfies local and asymPTotic conditional validity, and has a better conditional coverage and yields smaller prediction regions than other methods.

Conditional Density Estimation of Service Metrics for Networked Services

While mixture models provide a general and elegant solution, they incur a very high overhead related to hyper-parameter search and neural network training, and Histogram models allow for efficient training, but require adjustment to the specific use case.

Mathematical optimization in classification and regression trees

It is illustrated how these powerful formulations enhance the flexibility of tree models, being better suited to incorporate desirable properties such as cost-sensitivity, explainability, and fairness, and to deal with complex data, such as functional data.

Predictive Inference of a Wildfire Risk Pipeline in the United States Proposal Track

Wildfires are rare catastrophic events that are influenced by global climate change and present ongoing threats to life and property. The August 2019 IPCC report on climate change [1] notes that

Distribution-free conditional predictive bands using density estimators

Two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage are introduced: Dist-split and CD-split.



Random forests for functional covariates

The predictive performance of the proposed functional random forests is compared with commonly used parametric and nonparametric functional methods and with a nonfunctional random forest using the single measurements of the curve as covariates.

Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation

FlexCode is proposed, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression.

Quantile Regression Forests

It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean, in order to be competitive in terms of predictive power.

ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations

It is shown how a nonparametric conditional density estimation (CDE) framework, which is referred to as ABC–CDE, help address three nontrivial challenges in ABC: how to efficiently estimate the posterior distribution with limited simulations and different types of data.

Random Forests

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

Nonparametric Econometrics: Theory and Practice

Nonparametric Econometrics covers all the material necessary to understand and apply nonparametric methods for real-world problems and is the ideal introduction for graduate students and an indispensable resource for researchers.

GalSim: The modular galaxy image simulation toolkit

Transformation Forests

A novel approach based on a parametric family of distributions characterised by their transformation function is proposed, which allows broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way.

An assessment of photometric redshift pdf performance in the context of lsst