• Corpus ID: 227209855

Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss

  title={Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss},
  author={Michael Kohler and Sophie Langer},
  journal={arXiv: Statistics Theory},
Convolutional neural networks learned by minimizing the cross-entropy loss are nowadays the standard for image classification. Till now, the statistical theory behind those networks is lacking. We analyze the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk. Under suitable assumptions on the smoothness and structure of the aposteriori probability it is shown that these estimates achieve a rate of convergence which is independent of… 

Figures from this paper

Analysis of convolutional neural network image classifiers in a rotationally symmetric model

Under suitable structural and smoothness assumptions on the functional a posteriori probability, it is shown that least squares plug-in classifiers based on convolutional neural networks are able to circumvent the curse of dimensionality in binary image classi⬁cation if the authors neglect a resolution-dependent error term.

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

This work systematically investigates how square loss performs for overparametrized neural networks in the neural tangent kernel (NTK) regime, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data.

A statistical analysis of an image classification problem

A simple supervised classification problem for object detection on grayscale images and it is shown that perfect classi-cation is possible and a rate for the misclassi-classification error depending on the sample size and the number of pixels.

Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness

This paper investigates the approximation and estimation errors of the (dilated) convolutional neural networks when the input is infinite dimensional and shows that the dilated convolution is advantageous when the smoothness of the target function has a sparse structure.

Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

It is shown that DNN classifiers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.

Analysis of convolutional neural network image classifiers in a hierarchical max-pooling model with additional local pooling

Various convolutional neural network image classifiers are introduced and compared in view of their rate of convergence, and the finite sample size performance of the estimates is analyzed by applying them to simulated and real data.

Approximation Properties of Deep ReLU CNNs

A universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with one-hidden-layer ReLU neural networks (NNs) and approximation properties are obtained for one version of neural networks with Res net, pre-act ResNet, and MgNet architecture based on connections between these networks.

Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

This work establishes optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition and shows almost optimal rates under some additional restraints which circumvent the curse of dimensionality.

Convergence rates of deep ReLU networks for multiclass classification

This work study convergence of the learned probabilities to the true conditional class probabilities, and considers sparse deep ReLU network reconstructions mini-mizing cross-entropy loss in the multiclass classification setup.

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

This work establishes theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification, and proves that if the network architecture is properly chosen, ConvResNets can approximate Besov functions on manifolds with arbitrary accuracy.



On the rate of convergence of image classifiers based on convolutional neural networks

This work proves that in image classification it is possible to circumvent the curse of dimensionality by convolutional neural networks.

Generalization Bounds for Convolutional Neural Networks

This work proposes a tighter generalization bound for CNNs by exploiting the sparse and permutation structure of its weight matrices, and further study spectral norms of three commonly used convolution operations including standard convolution, depthwise Convolution, and pointwise convolution.

Estimation of a Function of Low Local Dimensionality by Deep Neural Networks

It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.

Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review

This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

It is shown a ResNet-type CNN can attain the minimax optimal error rates in these classes in more plausible situations -- it can be dense, and its width, channel size, and filter size are constant with respect to sample size.

On the rate of convergence of fully connected very deep neural network regression estimates

This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.

Deep Neural Networks Learn Non-Smooth Functions Effectively

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.