Generalized Optimization: A First Step Towards Category Theoretic Learning Theory
@inproceedings{Shiebler2021GeneralizedOA, title={Generalized Optimization: A First Step Towards Category Theoretic Learning Theory}, author={Dan Shiebler}, booktitle={ICO}, year={2021} }
The Cartesian reverse derivative is a categorical generalization of reverse-mode automatic differentiation. We use this operator to generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method. We then explore which properties of these algorithms are preserved in this generalized setting. First, we show that the transformation invariances of these algorithms are preserved: while generalized Newton's…
References
SHOWING 1-10 OF 21 REFERENCES
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer ScienceJ. Mach. Learn. Res.
- 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Reverse derivative categories
- MathematicsCSL
- 2020
A direct axiomatization of a category with a reverse derivative operation is given, in a similar style to that given by Cartesian differential categories for a forward derivative, to show that these linear maps form an additively enriched category with dagger biproducts.
Backprop as Functor: A compositional perspective on supervised learning
- Computer Science2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
- 2019
A key contribution is the notion of request function, which provides a structural perspective on backpropagation, giving a broad generalisation of neural networks and linking it with structures from bidirectional programming and open games.
The simple essence of automatic differentiation
- Computer ScienceProc. ACM Program. Lang.
- 2018
A simple, generalized AD algorithm calculated from a simple, natural specification, which is inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style.
A synthetic approach to Markov kernels, conditional independence, and theorems on sufficient statistics
- MathematicsArXiv
- 2019
Convex Optimization
- Computer ScienceIEEE Transactions on Automatic Control
- 2006
A comprehensive introduction to the subject of convex optimization shows in detail how such problems can be solved numerically with great efficiency.
Deep Learning
- Computer ScienceNature
- 2015
Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Disintegration and Bayesian inversion via string diagrams
- MathematicsMathematical Structures in Computer Science
- 2019
The existence of disintegration and Bayesian inversion is discussed for discrete probability, and also for measure-theoretic probability – via standard Borel spaces and via likelihoods.
l(Ax)) −1 ∇l(Ax) =
An Invitation to Applied Category Theory. Seven Sketches in Compositionality. By Brendan Fong and David I. Spivak. Cambridge University Press, 2019. Paperback, pp. 348. Price GBP 37.99. ISBN 9781108711821.
- MedicineActa crystallographica. Section A, Foundations and advances
- 2021