Statistical Efficiency of Compositional Nonparametric Prediction
@inproceedings{Xu2018StatisticalEO, title={Statistical Efficiency of Compositional Nonparametric Prediction}, author={Yixi Xu and Jean Honorio and Xiao Wang}, booktitle={AISTATS}, year={2018} }
In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(k\log(pq)+\log(k!))$, and the necessary number of samples is $\Omega(k\log (pq)-\log(k!))$. We further…Â
Figures from this paper
3 Citations
Compositional inductive biases in function learning
- Computer ScienceCognitive Psychology
- 2017
The view that the human intuitive theory of functions is inherently compositional is supported, using a grammar over Gaussian process kernels to formalize this idea within the framework of Bayesian regression.
COMPOSITIONAL INDUCTIVE BIASES 3 Compositional Inductive Biases in Function Learning
- Computer Science
- 2016
The view that the human intuitive theory of functions is inherently compositional is supported, using a grammar over Gaussian process kernels to formalize this idea within the framework of Bayesian regression.
Towards a unifying theory of generalization
- Biology, Psychology
- 2017
Gaussian Process regression is put forward and assessed as a model of human function learning that can unify several psychological theories of generalization and is found that the vast majority of subjects are best predicted by a Gaussian Process function learning model combined with an upper confidence bound sampling strategy.
References
SHOWING 1-10 OF 22 REFERENCES
Structure Discovery in Nonparametric Regression through Compositional Kernel Search
- Computer ScienceICML
- 2013
This work defines a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels, and presents a method for searching over this space of structures which mirrors the scientific discovery process.
Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
- Computer Science, MathematicsNIPS
- 2009
The main result is a lower bound on the minimax rate that scales as max (s log(p/s)/n, s ∊2n(H)).
Tensor decompositions for learning latent variable models
- Computer Science, MathematicsArXiv
- 2012
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
SpAM: Sparse Additive Models
- Computer ScienceNIPS
- 2007
A statistical analysis of the properties of SpAM and empirical results on synthetic and real data show that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.
Sum-product networks: A new deep architecture
- Computer Science2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)
- 2011
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are the most general conditions under which the partition…
Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions
- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2012
The information-theoretic limitations of the problem of graph selection for binary Markov random fields under high-dimensional scaling, in which the graph size and the number of edges k, and/or the maximal node degree d, are allowed to increase to infinity as a function of the sample size n, are analyzed.
Information-theoretic bounds on model selection for Gaussian Markov random fields
- Computer Science, Mathematics2010 IEEE International Symposium on Information Theory
- 2010
The first result establishes a set of necessary conditions on n(p, d) for any recovery method to consistently estimate the underlying graph, and the second result provides necessary conditions for any decoder to produce an estimate of the true inverse covariance matrix T satisfying ‖ Θ̂-Θ ‖ < δin the elementwise ℓ∞-norm.
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
- Computer ScienceNIPS
- 2008
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of…
Learning Non-Linear Combinations of Kernels
- Computer ScienceNIPS
- 2009
A projection-based gradient descent algorithm is given for solving the optimization problem of learning kernels based on a polynomial combination of base kernels and it is proved that the global solution of this problem always lies on the boundary.
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
- Computer ScienceJ. Mach. Learn. Res.
- 2001
This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.