• Corpus ID: 250913674

Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks

@inproceedings{Chen2020DistributionAA,
  title={Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks},
  author={Minshuo Chen and Wenjing Liao and Hongyuan Zha and Tuo Zhao},
  year={2020}
}
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning. Despite its remarkable empirical performance, there are limited theoretical studies on the statistical properties of GANs. This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions that have densities in a Hölder space. Our main result shows that, if the generator and discriminator network architectures are properly chosen, GANs are consistent… 

Figures and Tables from this paper

A Convenient Infinite Dimensional Framework for Generative Adversarial Learning

This work proposes an infinite dimensional theoretical framework for generative adversarial learning and shows that the Rosenblatt transformation induces an optimal generator, which is realizable in the hypothesis space of $\alpha$-Holder differentiable generators.

References

SHOWING 1-10 OF 15 REFERENCES

Nonparametric density estimation & convergence of GANs under Besov IPM losses

  • A. Uppal
  • Computer Science, Mathematics
  • 2019
It is shown how the results imply bounds on the statistical error of a GAN, showing, for example, that GANs can strictly outperform the best linear estimator and linear distribution estimates often fail to converge at the optimal rate.

Mode Regularized Generative Adversarial Networks

This work introduces several ways of regularizing the objective, which can dramatically stabilize the training of GAN models and shows that these regularizers can help the fair distribution of probability mass across the modes of the data generating distribution, during the early phases of training and thus providing a unified solution to the missing modes problem.

Improving GAN Training via Binarized Representation Entropy (BRE) Regularization

This work proposes a novel regularizer that guides the rectifier discriminator D to better allocate its model capacity, by encouraging the binary activation patterns on selected internal layers of D to have a high joint entropy.

A Convergence Theory for Deep Learning via Over-Parameterization

This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

It is proved that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels, when the data comes from mixtures of well-separated distributions.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples.

Interpreting Deep Visual Representations via Network Dissection

Network Dissection is described, a method that interprets networks by providing meaningful labels to their individual units that reveals that deep representations are more transparent and interpretable than they would be under a random equivalently powerful basis.

Visualizing and Understanding Convolutional Networks

A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.