• Corpus ID: 201670525

Deep Learning and MARS: A Connection

  title={Deep Learning and MARS: A Connection},
  author={Michael Kohler and Adam Krzyżak and Sophie Langer},
We consider least squares regression estimates using deep neural networks. We show that these estimates satisfy an oracle inequality, which implies that (up to a logarithmic factor) the error of these estimates is at least as small as the optimal possible error bound which one would expect for MARS in case that this procedure would work in the optimal way. As a result we show that our neural networks are able to achieve a dimensionality reduction in case that the regression function locally has… 

Tables from this paper

Analysis of the rate of convergence of neural network regression estimates which are easy to implement.

This article introduces a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement and which achieves the one-dimensional rate of convergence.

On the rate of convergence of a neural network regression estimate learned by gradient descent

It is shown that the resulting estimate achieves (up to a logarithmic factor) the optimal rate of convergence in a projection pursuit model.

Over-parametrized deep neural networks do not generalize well

A lower bound is presented which proves that deep neural networks with the sigmoidal squasher activation function in a regression setting do not generalize well on a new data in the sense that they do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.

Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”

Kohler and Krzyżak (2017) extended this function class in form of the so-called generalized hierarchical interaction models introduced as follows.

On the rate of convergence of image classifiers based on convolutional neural networks

This work proves that in image classification it is possible to circumvent the curse of dimensionality by convolutional neural networks.



On deep learning as a remedy for the curse of dimensionality in nonparametric regression

It is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression.

Nonparametric regression using deep neural networks with ReLU activation function

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Deep Neural Networks Learn Non-Smooth Functions Effectively

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.

Local Dimensionality Reduction

This paper examines several techniques for local dimensionality reduction in the context of locally weighted linear regression and finds that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of the candidate techniques.

Convergence rates for single hidden layer feedforward networks

Local dimensionality reduction for locally weighted learning

  • S. VijayakumarS. Schaal
  • Computer Science
    Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'
  • 1997
A learning algorithm is derived that exploits a dynamically growing local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression, that exploits data distributions from physical movement systems are locally low dimensional and dense.

Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions

A nonparametric regression model that naturally generalizes neural network models is discussed that is based on a finite number of one-dimensional transformations and can be estimated with a one- dimensional rate of convergence.

Investigating Smooth Multiple Regression by the Method of Average Derivatives

Abstract Let (x 1, …, xk, y) be a random vector where y denotes a response on the vector x of predictor variables. In this article we propose a technique [termed average derivative estimation (ADE)]

Approximation and estimation bounds for artificial neural networks

  • A. Barron
  • Computer Science
    Machine Learning
  • 2004
The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

Local Dimensionality Reduction for Non-Parametric Regression

Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too