Training distributed deep recurrent neural networks with mixed precision on GPU clusters

@inproceedings{Svyatkovskiy2017TrainingDD,
  title={Training distributed deep recurrent neural networks with mixed precision on GPU clusters},
  author={Alexey Svyatkovskiy and Julian Kates-Harbeck and William Tang},
  booktitle={MLHPC@SC},
  year={2017}
}
In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution across multiple GPU nodes and making use of high-speed interconnects. We introduce a learning rate schedule facilitating neural network convergence at up to O(100) workers. Strong scaling tests performed on clusters of NVIDIA Pascal P100 GPUs show linear… CONTINUE READING