Corpus ID: 59523651

Memory-Efficient Adaptive Optimization for Large-Scale Learning

@article{Anil2019MemoryEfficientAO,
  title={Memory-Efficient Adaptive Optimization for Large-Scale Learning},
  author={Rohan Anil and Vineet Gupta and Tomer Koren and Yoram Singer},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.11150}
}
  • Rohan Anil, Vineet Gupta, +1 author Yoram Singer
  • Published in ArXiv 2019
  • Computer Science, Mathematics
  • Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    4
    Twitter Mentions

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 23 REFERENCES

    Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    Adam: A Method for Stochastic Optimization

    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    On the Convergence of Adam and Beyond

    VIEW 1 EXCERPT