# Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss

@article{Kohler2020StatisticalTF, title={Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss}, author={Michael Kohler and Sophie Langer}, journal={arXiv: Statistics Theory}, year={2020} }

Convolutional neural networks learned by minimizing the cross-entropy loss are nowadays the standard for image classification. Till now, the statistical theory behind those networks is lacking. We analyze the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk. Under suitable assumptions on the smoothness and structure of the aposteriori probability it is shown that these estimates achieve a rate of convergence which is independent of…

## Figures from this paper

## 10 Citations

### Analysis of convolutional neural network image classifiers in a rotationally symmetric model

- Computer Science, Mathematics
- 2022

Under suitable structural and smoothness assumptions on the functional a posteriori probability, it is shown that least squares plug-in classiﬁers based on convolutional neural networks are able to circumvent the curse of dimensionality in binary image classi⬁cation if the authors neglect a resolution-dependent error term.

### Understanding Square Loss in Training Overparametrized Neural Network Classifiers

- Computer ScienceArXiv
- 2021

This work systematically investigates how square loss performs for overparametrized neural networks in the neural tangent kernel (NTK) regime, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data.

### A statistical analysis of an image classification problem

- Computer Science
- 2022

A simple supervised classiﬁcation problem for object detection on grayscale images and it is shown that perfect classi-cation is possible and a rate for the misclassi-classification error depending on the sample size and the number of pixels.

### Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness

- Computer ScienceICLR
- 2022

This paper investigates the approximation and estimation errors of the (dilated) convolutional neural networks when the input is infinite dimensional and shows that the dilated convolution is advantageous when the smoothness of the target function has a sparse structure.

### Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

- Computer Science
- 2022

It is shown that DNN classiﬁers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.

### Analysis of convolutional neural network image classifiers in a hierarchical max-pooling model with additional local pooling

- Computer ScienceArXiv
- 2021

Various convolutional neural network image classifiers are introduced and compared in view of their rate of convergence, and the finite sample size performance of the estimates is analyzed by applying them to simulated and real data.

### Approximation Properties of Deep ReLU CNNs

- Computer ScienceResearch in the Mathematical Sciences
- 2022

A universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with one-hidden-layer ReLU neural networks (NNs) and approximation properties are obtained for one version of neural networks with Res net, pre-act ResNet, and MgNet architecture based on connections between these networks.

### Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

- Computer Science
- 2022

This work establishes optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition and shows almost optimal rates under some additional restraints which circumvent the curse of dimensionality.

### Convergence rates of deep ReLU networks for multiclass classification

- Computer ScienceElectronic Journal of Statistics
- 2022

This work study convergence of the learned probabilities to the true conditional class probabilities, and considers sparse deep ReLU network reconstructions mini-mizing cross-entropy loss in the multiclass classiﬁcation setup.

### Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

- Computer ScienceICML
- 2021

This work establishes theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification, and proves that if the network architecture is properly chosen, ConvResNets can approximate Besov functions on manifolds with arbitrary accuracy.

## References

SHOWING 1-10 OF 39 REFERENCES

### On the rate of convergence of image classifiers based on convolutional neural networks

- Computer ScienceArXiv
- 2020

This work proves that in image classification it is possible to circumvent the curse of dimensionality by convolutional neural networks.

### Generalization Bounds for Convolutional Neural Networks

- Computer ScienceArXiv
- 2019

This work proposes a tighter generalization bound for CNNs by exploiting the sparse and permutation structure of its weight matrices, and further study spectral norms of three commonly used convolution operations including standard convolution, depthwise Convolution, and pointwise convolution.

### Fast convergence rates of deep neural networks for classification

- Computer ScienceNeural Networks
- 2021

### Estimation of a Function of Low Local Dimensionality by Deep Neural Networks

- Computer ScienceIEEE Transactions on Information Theory
- 2022

It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.

### Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review

- Computer ScienceNeural Computation
- 2017

This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.

### Deep Residual Learning for Image Recognition

- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

### ImageNet classification with deep convolutional neural networks

- Computer ScienceCommun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

### Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

- Computer ScienceICML
- 2019

It is shown a ResNet-type CNN can attain the minimax optimal error rates in these classes in more plausible situations -- it can be dense, and its width, channel size, and filter size are constant with respect to sample size.

### On the rate of convergence of fully connected very deep neural network regression estimates

- Computer ScienceThe Annals of Statistics
- 2021

This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.

### Deep Neural Networks Learn Non-Smooth Functions Effectively

- Computer ScienceAISTATS
- 2019

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.