Learn More
Time series classification is an important task with many challenging applications. A nearest neighbor (NN) classifier with dynamic time warping (DTW) distance is a strong solution in this context. On the other hand, feature-based approaches have been proposed as both classifiers and to provide insight into the series, but these approaches have problems(More)
The regularized random forest (RRF) was recently proposed for feature selection by building only one ensemble. In RRF the features are evaluated on a part of the training data at each tree node. We derive an upper bound for the number of distinct Gini information gain values in a node, and show that many features can share the same information gain at a(More)
In this paper, we propose a procedure, based on statistical design of experiments and gradient descent, that finds effective settings for parameters found in heuristics. We develop our procedure using four experiments. We use our procedure and a small subset of problems to find parameter settings for two new vehicle routing heuristics. We then set the(More)
Predictive models benefit from a compact, non-redundant subset of features that improves inter-pretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection(More)
A tree-ensemble method, referred to as time series forest (TSF), is proposed for time series classification. TSF employs a combination of entropy gain and a distance measure, referred to as the Entrance (entropy and distance) gain, for evaluating the splits. Experimental studies show that the Entrance gain improves the accuracy of TSF. TSF randomly samples(More)
Multivariate time series (MTS) classification has gained importance with the increase in the number of temporal datasets in different domains (such as medicine, finance, multimedia, etc.). Similarity-based approaches, such as nearest-neighbor classifiers, are often used for univariate time series, but MTS are characterized not only by individual attributes,(More)
The threshold bootstrap (TB) is a promising ncw method of inference for a single autocorrclatcd data series, such as the output of a discrctc event simulation. The method works by rcsampling runs of data created when the series crosses a threshold Icvcl, such as the series mean. We performed a Monlc Carlo evaluation of the TB using three LYPCSof data: white(More)
Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial(More)
We propose a tree regularization framework, which enables many tree models to perform feature selection efficiently. The key idea of the regularization framework is to penalize selecting a new feature for splitting when its gain (e.g. information gain) is similar to the features used in previous splits. The regularization framework is applied on random(More)