Corpus ID: 219721397

FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

  title={FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features},
  author={Yuancheng Xu and Athanasse Zafirov and R. Michael Alvarez and Dan Kojis and Min Tan and Christina M. Ramirez},
This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted… Expand


Tree-Structured Mixed-Effects Regression Modeling for Longitudinal Data
This work proposes a tree algorithm by combining the merits of a tree-based model and a mixed-effects model for longitudinal data to alleviate variable selection bias through residual analysis, which is used to solve problems that exhaustive search approaches suffer from. Expand
Unbiased regression trees for longitudinal and clustered data
A new version of the RE-EM regression tree method for longitudinal and clustered data is presented, which corrects for the tendency of CART to split on variables with more possible split points at the expense of those with fewer split points. Expand
RE-EM trees: a data mining approach for longitudinal and clustered data
This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods, and applies the resulting estimation method to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. Expand
Tree-Structured Methods for Longitudinal Data
Abstract The thrust of tree techniques is the extraction of meaningful subgroups characterized by common covariate values and homogeneous outcome. For longitudinal data, this homogeneity can pertainExpand
Fuzzy Forests: Extending Random Forests for Correlated, High-Dimensional Data
Author(s): Conn, Daniel; Ngun, Tuck; Li, Gang; Ramirez, Christina | Abstract: In this paper we introduce fuzzy forests, a novel machine learning algorithm for rankingthe importance of features inExpand
Unbiased Recursive Partitioning: A Conditional Inference Framework
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time:Expand
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees
The generalized linear mixed-effects model tree (GLMM tree) algorithm is proposed, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. Expand
Classification and regression trees
  • W. Loh
  • Computer Science
  • Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
  • 2011
This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. Expand
WGCNA: an R package for weighted correlation network analysis
The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis that includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Expand
Gene selection and classification of microarray data using random forest
It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy. Expand