• Corpus ID: 4871682

Statistical Efficiency of Compositional Nonparametric Prediction

  title={Statistical Efficiency of Compositional Nonparametric Prediction},
  author={Yixi Xu and Jean Honorio and Xiao Wang},
In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is $O(k\log(pq)+\log(k!))$, and the necessary number of samples is $\Omega(k\log (pq)-\log(k!))$. We further… 

Figures from this paper

Compositional inductive biases in function learning
The view that the human intuitive theory of functions is inherently compositional is supported, using a grammar over Gaussian process kernels to formalize this idea within the framework of Bayesian regression.
COMPOSITIONAL INDUCTIVE BIASES 3 Compositional Inductive Biases in Function Learning
The view that the human intuitive theory of functions is inherently compositional is supported, using a grammar over Gaussian process kernels to formalize this idea within the framework of Bayesian regression.
Towards a unifying theory of generalization
Gaussian Process regression is put forward and assessed as a model of human function learning that can unify several psychological theories of generalization and is found that the vast majority of subjects are best predicted by a Gaussian Process function learning model combined with an upper confidence bound sampling strategy.


Structure Discovery in Nonparametric Regression through Compositional Kernel Search
This work defines a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels, and presents a method for searching over this space of structures which mirrors the scientific discovery process.
Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
The main result is a lower bound on the minimax rate that scales as max (s log(p/s)/n, s ∊2n(H)).
Tensor decompositions for learning latent variable models
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
SpAM: Sparse Additive Models
A statistical analysis of the properties of SpAM and empirical results on synthetic and real data show that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.
Sum-product networks: A new deep architecture
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are the most general conditions under which the partition
Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions
The information-theoretic limitations of the problem of graph selection for binary Markov random fields under high-dimensional scaling, in which the graph size and the number of edges k, and/or the maximal node degree d, are allowed to increase to infinity as a function of the sample size n, are analyzed.
Information-theoretic bounds on model selection for Gaussian Markov random fields
The first result establishes a set of necessary conditions on n(p, d) for any recovery method to consistently estimate the underlying graph, and the second result provides necessary conditions for any decoder to produce an estimate of the true inverse covariance matrix T satisfying ‖ Θ̂-Θ ‖ < δin the elementwise ℓ∞-norm.
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of
Learning Non-Linear Combinations of Kernels
A projection-based gradient descent algorithm is given for solving the optimization problem of learning kernels based on a polynomial combination of base kernels and it is proved that the global solution of this problem always lies on the boundary.
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting.