Corpus ID: 37463377

Learning Decision Trees from Histogram Data

  title={Learning Decision Trees from Histogram Data},
  author={Ram B. Gurung and Tony Lindgren and Henrik Bostr{\"o}m},
  booktitle={DMIN 2015},
When applying learning algorithms to histogram data, bins of such variables are normally treated as separate independent variables. However, this may lead to a loss of information as the underlying ... 
Learning Random Forest from Histogram Data Using Split Specific Axis Rotation
An adapted version of the random forest algorithm is proposed to be applied to data containing histogram variables and it is shown that this algorithm can be used to solve the classification problem of histogram variable replacement. Expand
Learning Decision Trees and Random Forests from Histogram Data : An application to component failure prediction for heavy duty trucks
A large volume of data has become commonplace in many domains these days. Machine learning algorithms can be trained to look for any useful hidden patterns in such data. Sometimes, these big data mExpand
Learning Decision Trees from Histogram Data Using Multiple Subsets of Bins
A sliding window approach to select subsets of the bins to be considered simultaneously while partitioning examples significantly reduces the number of possible splits to consider, allowing for substantially larger histograms to be handled. Expand
Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests
Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance serviceExpand
Planning Flexible Maintenance for Heavy Trucks using Machine Learning Models, Constraint Programming, and Route Optimization
Maintenance planning of trucks at Scania have previously been done using static cyclic plans with fixed sets of maintenance tasks, determined by mileage, calendar time, and some data driven physicaExpand


Classification and Regression Trees
This chapter discusses tree classification in the context of medicine, where right Sized Trees and Honest Estimates are considered and Bayes Rules and Partitions are used as guides to optimal pruning. Expand
Classification and multivariate analysis for complex data structures
This paper presents a meta-analysis of data mining, classification and discrimination in terms of categorical data and Latent Class approach, and the results show clear trends in both spatial and temporal data mining and classification. Expand
An Incremental Method for Finding Multivariate Splits for Decision Trees
The PT2 algorithm, which searches for a multivariate split at each node, is presented, which is incremental, handles ordered and unordered variables, and estimates missing values. Expand
Principal Component Analysis for Categorical Histogram Data: Some Open Directions of Research
In recent years, the analysis of symbolic data where the units are categories, classes or concepts described by interval, distributions, sets of categories and the like becomes a challenging taskExpand
A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data
A new distance is presented, based on the Wasserstein metric, in order to cluster a set of data described by distributions with finite continue support, or, as called in SDA, by “histograms”, a measure of inertia of data with respect to a barycenter that satisfies the Huygens theorem of decomposition of inertia. Expand
Distribution and Symmetric Distribution Regression Model for Histogram-Valued Variables
Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogramExpand
Histogram PCA
An important attempt to analyze a symbolic data set for dimensionality reduction when the features are of histogram type and proposes basic arithmetic and definitions related to histogram data. Expand
F eb 2 01 2 Linear regression for numeric symbolic variables : an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in Bock and Diday [1] and the parametersExpand
Design of multicategory multifeature split decision trees using perceptron learning
A new top-down decision tree design method is presented that generates compact trees of superior performance by using multifeature splits in place of single feature splits at successive stages of the tree development. Expand
Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics)
This chapter discusses Descriptive Statistics: Two or More Variates, which focuses on the part of the model concerned with Hierarchy-Divisive Clustering and Cluster Analysis. Expand