Corpus ID: 52912061

Large batch size training of neural networks with adversarial training and second-order information

@article{Yao2018LargeBS,
  title={Large batch size training of neural networks with adversarial training and second-order information},
  author={Z. Yao and Amir Gholami and K. Keutzer and Michael W. Mahoney},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.01021}
}
The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the… Expand
Concurrent Adversarial Learning for Large-Batch Training
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Inefficiency of K-FAC for Large Batch Size Training
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Parameter Re-Initialization through Cyclical Batch Size Schedules
Accelerate Mini-batch Machine Learning Training With Dynamic Batch Size Fitting
ImageNet/ResNet-50 Training in 224 Seconds
Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face Learning
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
...
1
2
3
...

References

SHOWING 1-10 OF 68 REFERENCES
Scaling SGD Batch Size to 32K for ImageNet Training
Inefficiency of K-FAC for Large Batch Size Training
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
A Progressive Batching L-BFGS Method for Machine Learning
...
1
2
3
4
5
...