Nonparametric Density Estimation: Toward Computational Tractability

  title={Nonparametric Density Estimation: Toward Computational Tractability},
  author={Alexander G. Gray and Andrew W. Moore},
Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a well-known benefit of nonparametric methods is their ability to achieve estimation… 

Figures and Tables from this paper

Density Estimation with Adaptive Sparse Grids for Large Data Sets
This work presents an adaptive sparse-grid-based density estimation method which discretizes the estimated density function on basis functions centered at grid points rather than on kernels centered at the data points, so that the costs of evaluating the estimateddensity function are independent from the number of data points.
Smooth densities and generative modeling with unsupervised random forests
A new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints is proposed, achieving superior performance in a range of benchmark trials while executing about two orders of magnitude faster on average.
Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation
FlexCode is proposed, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression.
Fast kernel conditional density estimation: A dual-tree Monte Carlo approach
Very fast optimal bandwidth selection for univariate kernel density estimation
A computationally e‐cient †iexact approximation algorithm for the univariate Gaussian kernel based density derivative estimation that reduces the computational complexity from O(MN) to linear O(N + M).
Density estimation trees
DETs empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators, suggesting they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions.
The ed Method for Nonparametric Density Estimation and Diagnostic Checking
The ed method of density estimation for a univariate x takes a model building approach: an estimation method that can accurately fit many density patterns in data, and leads to diagnostic visual
Rapid Evaluation of Multiple Density Models
KDE, the most widely used and studied nonparametric density estimation method and thus the focus here, can be shown to converge to the true underlying density with probability as more data are observed, with no distribution assumptions at all.
Fast Kernel Density Estimation Using Gaussian Filter Approximation
This work uses techniques from digital signal processing in the context of estimation theory, to allow rapid computations of kernel density estimates, and outperforms other state of the art solutions, due to a fully linear complexity and a negligible overhead, even for small sample sets.
Nonparametric Conditional Density Estimation in a High-Dimensional Regression Setting
In some applications (e.g., in cosmology and economics), the regression is not adequate to represent the association between a predictor x and a response Z because of multi-modality and asymmetry of


Nonparametric Density Estimation
This chapter describes the background material related to the nonparametric density estimation, taking into account only the univariate case; extending the results to cover more than one variable, however, is often a straightforward task.
'N-Body' Problems in Statistical Learning
A suite of new geometric techniques which are applicable in principle to any 'N-body' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM's are presented.
A Brief Survey of Bandwidth Selection for Density Estimation
Abstract There has been major progress in recent years in data-based bandwidth selection for kernel density estimation. Some “second generation” methods, including plug-in and smoothed bootstrap
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data
This paper defines the anchors hierarchy--a fast data structure and algorithm for localizing data based only on a triangle-inequality-obeying distance metric and shows how this structure, decorated with cached sufficient statistics, allows a wide variety of statistical learning algorithms to be accelerated even in thousands of dimensions.
Fast Implementations of Nonparametric Curve Estimators
Speed tests show that the fast methods for kernel-based nonparametric curve estimators are as fast or somewhat faster than methods traditionally considered very fast, such as LOWESS and smoothing splines.
Fast implementations of nonparametric curve estimators
The main ideas behind two different approaches of kernel based nonparametric curve estimators are made clear and the fast methods are seen to be somewhat better than methods traditionally considered very fast, such as LOWESS and smoothing splines.
Nonparametric density estimation : the L[1] view
Differentiation of Integrals Consistency Lower Bounds for Rates of Convergence Rates of Convergence in L1 The Automatic Kernel Estimate: L1 and Pointwise Convergence Estimates Related to the Kernel
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
A very sparse data structure, the ADtree, is provided to minimize memory use and it is empirically demonstrated that tractably-sized data structures can be produced for large real-world datasets by using a sparse tree structure that never allocates memory for counts of zero.
A comparative study of some kernel-based nonparametric density estimators
Some practical approaches to the problem of choosing parameters which control the smoothness of kernel-based density estimators are investigated. Fixed and variable kernels are considered, and
Fast Computation of Multivariate Kernel Estimators
Abstract Multivariate extensions of binning techniques for fast computation of kernel estimators are described and examined. Several questions arising from this multivariate extension are addressed.