# Lookahead Optimizer: k steps forward, 1 step back

@inproceedings{Zhang2019LookaheadOK,
title={Lookahead Optimizer: k steps forward, 1 step back},
author={Michael Ruogu Zhang and James Lucas and Geoffrey E. Hinton and Jimmy Ba},
booktitle={NeurIPS},
year={2019}
}
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. [...] Key Method Intuitively, the algorithm chooses a search direction by \emph{looking ahead} at the sequence of "fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance…Expand
229 Citations

