Corpus ID: 197935378

Lookahead Optimizer: k steps forward, 1 step back

@inproceedings{Zhang2019LookaheadOK,
  title={Lookahead Optimizer: k steps forward, 1 step back},
  author={Michael Ruogu Zhang and James Lucas and Geoffrey E. Hinton and Jimmy Ba},
  booktitle={NeurIPS},
  year={2019}
}
  • Michael Ruogu Zhang, James Lucas, +1 author Jimmy Ba
  • Published in NeurIPS 2019
  • Computer Science, Mathematics
  • The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. [...] Key Method Intuitively, the algorithm chooses a search direction by \emph{looking ahead} at the sequence of "fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance…Expand Abstract

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 30 CITATIONS

    An improved training scheme for deep neural network ultrasound beamforming

    VIEW 8 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    An Lipreading Modle with DenseNet and E3D-LSTM

    VIEW 11 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

    VIEW 2 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    ENAS U-Net: Evolutionary Neural Architecture Search for Retinal Vessel Segmentation

    VIEW 3 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    From English To Foreign Languages: Transferring Pre-trained Language Models

    VIEW 2 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Iterate Averaging Helps: An Alternative Perspective in Deep Learning

    VIEW 7 EXCERPTS
    CITES BACKGROUND & RESULTS
    HIGHLY INFLUENCED

    Applying Cyclical Learning Rate to Neural Machine Translation

    VIEW 1 EXCERPT
    CITES BACKGROUND

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 47 REFERENCES

    ImageNet: A large-scale hierarchical image database

    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    Deep Residual Learning for Image Recognition

    VIEW 8 EXCERPTS
    HIGHLY INFLUENTIAL

    Adam: A Method for Stochastic Optimization

    VIEW 14 EXCERPTS

    Attention is All you Need

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Wide Residual Networks

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL