Model Complexity of Deep Learning: A Survey

@article{Hu2021ModelCO,
  title={Model Complexity of Deep Learning: A Survey},
  author={Xia Hu and Lingyang Chu and Jian Pei and Weiqing Liu and Jiang Bian},
  journal={Knowl. Inf. Syst.},
  year={2021},
  volume={63},
  pages={2585-2619}
}
Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model… 

Can machine learning accelerate process understanding and decision‐relevant predictions of river water quality?

The global decline of water quality in rivers and streams has resulted in a pressing need to design new watershed management strategies. Water quality can be affected by multiple stressors including

Replacing Neural Networks by Optimal Analytical Predictors for the Detection of Phase Transitions

TLDR
This work derives analytical expressions for the optimal output of three widely used NN-based methods for detecting phase transitions and expects similar analyses to provide a deeper understanding of other classification tasks in condensed matter physics.

CourtNet for Infrared Small-Target Detection

Infrared small-target detection (ISTD) is an important computer vision task. ISTD aims at separating small targets from complex background clutter. The infrared radiation decays over distances,

Deep Learning for the Automatic Segmentation of Extracranial Venous Malformations of the Head and Neck from MR Images Using 3D U-Net

Background: It is difficult to characterize extracranial venous malformations (VMs) of the head and neck region from magnetic resonance imaging (MRI) manually and one at a time. We attempted to

WHU-OHS: A benchmark dataset for large-scale Hersepctral Image classification

Modulation Classification Based on Eye Diagrams and Deep Learning

TLDR
This paper uses deep learning with an eye diagram to study and identify modulated signals in narrowband fading channels, e.g., Rayleigh and Rician fading, and shows that deep learning neural networks can classify modulation signals with the impact of the fading channel using an eye diagrams.

The Right to be an Exception in Data-Driven Decision-Making

Data-driven assessments estimate a target—such as the likelihood an individual will recidivate or commit welfare fraud—by pattern matching against historical data. There are, however, limitations to

References

SHOWING 1-10 OF 108 REFERENCES

Deep double descent: where bigger models and more data hurt

TLDR
The notion of model complexity allows us to identify certain regimes where increasing the number of train samples actually hurts test performance, and defines a new complexity measure called the effective model complexity and conjecture a generalized double descent with respect to this measure.

On the Expressive Power of Deep Polynomial Neural Networks

TLDR
The dimension of this variety is proposed as a precise measure of the expressive power of polynomial neural networks, including an exact formula for high activation degrees, as well as upper and lower bounds on layer widths in order for deep polynomials networks to fill the ambient functional space.

Complexity of Linear Regions in Deep Networks

TLDR
The theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches the empirical observations and concludes that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and this gap can be quantified.

Bounding and Counting Linear Regions of Deep Neural Networks

TLDR
The results indicate that a deep rectifier network can only have more linear regions than every shallow counterpart with same number of neurons if that number exceeds the dimension of the input.

Sensitivity and Generalization in Neural Networks: an Empirical Study

TLDR
It is found that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization.

Foundations of Machine Learning

TLDR
This graduate-level textbook introduces fundamental concepts and methods in machine learning, and provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application.

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

TLDR
An analytical characterization of the new Fisher-Rao norm is discovered, through which it is shown that the new measure serves as an umbrella for several existing norm-based complexity measures and establishes norm-comparison inequalities.

Expressive power of recurrent neural networks

TLDR
The expressive power theorem is proved (an exponential lower bound on the width of the equivalent shallow network) for a class of recurrent neural networks -- ones that correspond to the Tensor Train (TT) decomposition, meaning that even processing an image patch by patch with an RNN can be exponentially more efficient than a (shallow) convolutional network with one hidden layer.

The Expressive Power of Neural Networks: A View from the Width

TLDR
It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy.

Constructive lower bounds on model complexity of shallow perceptron networks

  • V. Kůrková
  • Computer Science
    Neural Computing and Applications
  • 2017
TLDR
Limitations of shallow (one-hidden-layer) perceptron networks are investigated with respect to computing multivariable functions on finite domains and a subclass of these functions is described whose elements can be computed by two- hidden-layer perceptron Networks with the number of units depending on logarithm of the size of the domain linearly.
...