Corpus ID: 236772216

Convergence rates of deep ReLU networks for multiclass classification

  title={Convergence rates of deep ReLU networks for multiclass classification},
  author={Thijs Bos and Johannes Schmidt-Hieber},
For classification problems, trained deep neural networks return probabilities of class memberships. In this work we study convergence of the learned probabilities to the true conditional class probabilities. More specifically we consider sparse deep ReLU network reconstructions minimizing cross-entropy loss in the multiclass classification setup. Interesting phenomena occur when the class membership probabilities are close to zero. Convergence rates are derived that depend on the near-zero… 

Figures from this paper


Fast convergence rates of deep neural networks for classification
It is shown that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity) is carefully selected.
Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss
Convolutional neural networks learned by minimizing the cross-entropy loss are nowadays the standard for image classification. Till now, the statistical theory behind those networks is lacking. We
On deep learning as a remedy for the curse of dimensionality in nonparametric regression
Assuming that a smoothness condition and a suitable restriction on the structure of the regression function hold, it is shown that least squares estimates based on multilayer feedforward neural
On the rate of convergence of fully connected very deep neural network regression estimates
This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.
A Moment Bound for Multi-hinge Classifiers
The success of support vector machines in binary classification relies on the fact that hinge loss employed in the risk minimization targets the Bayes rule. Recent research explores some extensions
Error bounds for approximations with deep ReLU networks
It is proved that deep ReLU networks more efficiently approximate smooth functions than shallow networks and adaptive depth-6 network architectures more efficient than the standard shallow architecture are described.
Fast learning rates for plug-in classifiers
It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n -1/2 .
Nonparametric regression using deep neural networks with ReLU activation function
The discussant contributions highlight the gaps in the theoretical understanding and outline many possible directions for future research in this area.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Optimal aggregation of classifiers in statistical learning
Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that