Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search

@article{Trillos2022SemidiscreteOT,
  title={Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search},
  author={Nicol{\'a}s Garc{\'i}a Trillos and Javier Morales},
  journal={J. Nonlinear Sci.},
  year={2022},
  volume={32},
  pages={27}
}
In this paper we introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport. Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search. With this aim in mind, we discuss the geometric and theoretical motivation for new techniques for neural architecture search (in the companion work \cite{practical}; we show that algorithms inspired by our framework are competitive with contemporaneous methods… 
Traditional and accelerated gradient descent for neural architecture search
TLDR
Two algorithms for neural architecture search (NASGD and NASAGD) are introduced, which can analyze forty times as many architectures as the hill climbing methods while using the same computational resources and time and achieving comparable levels of accuracy.
Neural Architecture Search via Bregman Iterations
TLDR
It is demonstrated that using the proposed novel strategy for Neural Architecture Search, one can unveil, for instance, residual autoencoders for denoising, deblurring, and classification tasks.

References

SHOWING 1-10 OF 43 REFERENCES
Scaling Limits of Discrete Optimal Transport
TLDR
It is shown that the corresponding lower bound for the discrete transport metric may fail in general, even on certain one-dimensional and symmetric two-dimensional meshes, and it is proved that the asymptotic lower bound holds under an isotropy assumption on the mesh.
Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler
TLDR
A new version of such a methodology for solving inverse problems without the use of derivatives or adjoints of the forward model is proposed, and numerical evidence of the practicality of the method is presented.
Homogenisation of one-dimensional discrete optimal transport
Vector-Valued Optimal Mass Transport
TLDR
This work defines Wasserstein-type metrics on vector-valued distributions supported on continuous spaces as well as graphs and introduces the problem of transporting vector- valued distributions.
Traditional and accelerated gradient descent for neural architecture search
TLDR
Two algorithms for neural architecture search (NASGD and NASAGD) are introduced, which can analyze forty times as many architectures as the hill climbing methods while using the same computational resources and time and achieving comparable levels of accuracy.
Nonlocal-interaction equation on graphs: gradient flow structure and continuum limit
TLDR
It is shown that the solutions of the NL$^2$IE on graphs converge as the empirical measures of the set of vertices converge weakly, which establishes a valuable discrete-to-continuum convergence result.
Algorithms for Hyper-Parameter Optimization
TLDR
This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.
A Fast Learning Algorithm for Deep Belief Nets
TLDR
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Hyper-Parameter Optimization: A Review of Algorithms and Applications
TLDR
A review of the most essential topics on HPO, including the key hyper-parameters related to model training and structure, and a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.
Fokker–Planck Equations for a Free Energy Functional or Markov Process on a Graph
The classical Fokker–Planck equation is a linear parabolic equation which describes the time evolution of the probability distribution of a stochastic process defined on a Euclidean space.
...
...