• Corpus ID: 226289631

Towards a Better Global Loss Landscape of GANs

  title={Towards a Better Global Loss Landscape of GANs},
  author={Ruoyu Sun and Tiantian Fang and Alexander G. Schwing},
Understanding of GAN training is still very limited. One major challenge is its non-convex-non-concave min-max objective, which may lead to sub-optimal local minima. In this work, we perform a global landscape analysis of the empirical loss of GANs. We prove that a class of separable-GAN, including the original JS-GAN, has exponentially many bad basins which are perceived as mode-collapse. We also study the relativistic pairing GAN (RpGAN) loss which couples the generated samples and the true… 
DGL-GAN: Discriminator Guided Learning for GAN Compression
A novel yet simple Discriminator Guided Learning approach for compressing vanilla GAN, dubbed DGL-GAN, which is valid since empirically, learning from the teacher discriminator could facilitate the performance of student GANs and achieves state-of-the-art results.
A Neural Tangent Kernel Perspective of GANs
A novel theoretical framework of analysis for Generative Adversarial Networks (GANs) is proposed, leveraging the theory of infinitewidth neural networks for the discriminator via its Neural Tangent Kernel to characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network.
SyRa: Synthesized Rain Images for Deraining Algorithms
A new approach to synthesize realistic rainy scenes using GAN, which is a world-first attempt as far as the authors know, is presented and a synthesized rain image dataset consisting of 11K clean images and 55K rainy images was constructed.
Doubly Stochastic Generative Arrivals Modeling
We propose a new framework named DS-WGAN that integrates the doubly stochastic (DS) structure and the Wasserstein generative adversarial networks (WGAN) to model, estimate, and simulate a wide class
Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans
This work proposes an unsupervised multi-domain I2I network with pre-trained style encoder that translates retinal OCT images in one domain to multiple domains and outperforms state-of-the-art models like MUNIT and CycleGAN synthesizing diverse pathological scans.
WGAN with an Infinitely Wide Generator Has No Spurious Stationary Points
This work shows that GANs with a 2-layer infinite-width generator and a2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points.
Generative Adversarial Network for Probabilistic Forecast of Random Dynamical System
A regularization strategy for a GAN based on consistency conditions for the sequential inference problems is proposed and the maximum mean discrep- ancy (MMD) is used to enforce the consistency between conditional and marginal distributions of a stochastic process.
Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images
This work proposes to perform reverse engineering of GMs to infer model hyperparameters from the images generated by these models, and proposes a framework with two components: a Fingerprint Estimation Network (FEN) and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints.
On the Benefit of Width for Neural Networks: Disappearance of Bad Basins
This work proves that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub- optimal basins, and proves two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub/optimal basin.


Gradient descent GAN optimization is locally stable
This paper analyzes the "gradient descent" form of GAN optimization i.e., the natural setting where the authors simultaneously take small gradient steps in both generator and discriminator parameters, and proposes an additional regularization term for gradient descent GAN updates that is able to guarantee local stability for both the WGAN and the traditional GAN.
Which Training Methods for GANs do actually Converge?
This paper describes a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent, and extends convergence results to more general GANs and proves local convergence for simplified gradient penalties even if the generator and data distribution lie on lower dimensional manifolds.
Understanding GANs: the LQG Setting
This paper proposes a natural way of specifying the loss function for GANs by drawing a connection with supervised learning and sheds light on the statistical performance of GAN's through the analysis of a simple LQG setting: the generator is linear, the lossfunction is quadratic and the data is drawn from a Gaussian distribution.
Improved Training of Wasserstein GANs
This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.
On Convergence and Stability of GANs
This work proposes studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions, and shows that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.
Fisher GAN
Fisher GAN is introduced that fits within the Integral Probability Metrics (IPM) framework for training GANs and allows for stable and time efficient training that does not compromise the capacity of the critic, and does not need data independent constraints such as weight clipping.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
Dualing GANs
This paper explores ways to tackle the instability problem of GAN training by dualizing the discriminator, starting from linear discriminators and demonstrating how to extend this intuition to non-linear formulations.
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport
This work shows that obtaining gradient information of the smoothed Wasserstein GAN formulation, which is based on regularized Optimal Transport (OT), is computationally effortless and hence one can apply first order optimization methods to minimize this objective.
Generative Modeling Using the Sliced Wasserstein Distance
This work considers an alternative formulation for generative modeling based on random projections which, in its simplest form, results in a single objective rather than a saddle-point formulation and finds its approach to be significantly more stable compared to even the improved Wasserstein GAN.