The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning

@article{Mathy2015TheBF,
  title={The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning},
  author={Charles Mathy and Nate Derbinsky and Jos{\'e} Bento and Jonathan Rosenthal and Jonathan S. Yedidia},
  journal={ArXiv},
  year={2015},
  volume={abs/1505.02867}
}
We describe a new instance-based learning algorithm called the Boundary Forest (BF) algorithm, that can be used for supervised and unsupervised learning. The al- gorithm builds a forest of trees whose nodes store previ- ously seen examples. It can be shown data points one at a time and updates itself incrementally, hence it is nat- urally online. Few instance-based algorithms have this property while being simultaneously fast, which the BF is. This is crucial for applications where one… 

Figures and Tables from this paper

Efficient learning of neighbor representations for boundary trees and forests
TLDR
Differentiable Boundary Sets is introduced, an algorithm that overcomes the computational issues of the differentiable boundary tree scheme and also improves its classification accuracy and data representability.
Alternating optimization of decision trees, with application to learning sparse oblique trees
TLDR
An algorithm is given that, given an input tree, produces a new tree with the same or smaller structure but new parameter values that provably lower or leave unchanged the misclassification error, and can handle a sparsity penalty.
Learning data representations for robust neighbour-based inference
TLDR
Differentiable Boundary Sets is introduced, an algorithm that overcomes the computational issues of the DBT scheme and also improves its classification accuracy and data representability and offers a significant reduction in training time.
Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees
TLDR
A new method called differentiable boundary tree is introduced which allows for learning deep kNN representations allowing for very efficient trees with a clearly interpretable structure by modelling traversals in the tree as stochastic events.
Latent source models for nonparametric inference
TLDR
This thesis bridges the gap between theory and practice for nearest-neighbor inference methods in the three specific case studies of time series classification, online collaborative filtering, and patch-based image segmentation by derive theoretical performance guarantees for these methods.
k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks
TLDR
This paper proposes two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequences of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers.
Interpretable Synthetic Reduced Nearest Neighbor: An Expectation Maximization Approach
TLDR
A novel optimization of Synthetic Reduced Nearest Neighbor based on Expectation Maximization (EM-SRNN) that always converges while also monotonically decreases the objective function is provided.
Q-learning with Nearest Neighbors
TLDR
This work considers model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel and establishes a lower bound that argues that the dependence of $ tilde{\Omega}\big(1/\varepsilon^{d+2}\big)$ is necessary.
Distributed Nearest Neighbor Classification.
TLDR
This work replaces majority voting with the weighted voting scheme, and provides sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate.
Explaining the Success of Nearest Neighbor Methods in Prediction
TLDR
This monographaims to explain the success of near neighbor prediction methods, and covers recent theoretical guarantees on nearest neighborprediction in the three case studies of time series forecasting, recommending products to people over time, and delineating human organs in medical images by looking at image patches.
...
...

References

SHOWING 1-10 OF 22 REFERENCES
Instance-Based Learning Algorithms
TLDR
This paper describes how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy and extends the nearest neighbor algorithm, which has large storage requirements.
Reduction Techniques for Instance-Based Learning Algorithms
TLDR
Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.
Cover trees for nearest neighbor
TLDR
A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
Distance Metric Learning for Large Margin Nearest Neighbor Classification
TLDR
This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration
TLDR
A system that answers the question, “What is the fastest approximate nearest-neighbor algorithm for my data?” and a new algorithm that applies priority search on hierarchical k-means trees, which is found to provide the best known performance on many datasets.
Five Balltree Construction Algorithms
TLDR
This report compares 5 different algorithms for constructing ball trees from data and finds that the bottom up approach usually produces the best trees but has the longest construction time.
Random Forests
TLDR
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Online learning of robust object detectors during unstable tracking
TLDR
This work proposes a new approach, called Tracking-Modeling-Detection (TMD), that closely integrates adaptive tracking with online learning of the object-specific detector and shows the real-time learning and classification is achievable with random forests.
Near Neighbor Search in Large Metric Spaces
TLDR
A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.
Optimised KD-trees for fast image descriptor matching
TLDR
This paper has extended priority search, to priority search among multiple trees, by creating multiple KD-trees from the same data set and simultaneously searching among these trees, and improved the KD-treepsilas search performance significantly.
...
...