Corpus ID: 15196840

Improving the speed of neural networks on CPUs

@inproceedings{Vanhoucke2011ImprovingTS,
  title={Improving the speed of neural networks on CPUs},
  author={V. Vanhoucke and A. Senior and Mark Z. Mao},
  year={2011}
}
Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. [...] Key Method We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3× improvement over an optimized floating-point baseline. We use speech recognition as an example task, and show that a real-time hybrid…Expand
603 Citations
Comparing deep learning performance on BigData by using CPUs and GPUs
  • 5
Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design
  • 27
  • PDF
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design
  • 192
  • PDF
A Survey on Methods and Theories of Quantized Neural Networks
  • 69
  • PDF
A survey of neural network accelerators
  • 17
On the quantization of recurrent neural networks
  • 1
  • PDF
8-Bit Approximations for Parallelism in Deep Learning
  • 80
  • PDF
Transfer Learning with Binary Neural Networks
  • 1
  • PDF
DaDianNao: A Neural Network Supercomputer
  • 78
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 16 REFERENCES
Neural Network Implementation Using CUDA and OpenMP
  • 141
GPU implementation of neural networks
  • 321
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
  • 233
  • PDF
Faster matrix-vector multiplication on GeForce 8800GTX
  • N. Fujimoto
  • Computer Science
  • 2008 IEEE International Symposium on Parallel and Distributed Processing
  • 2008
  • 69
  • PDF
CUDAMat: a CUDA-based matrix class for Python
  • 82
  • PDF
Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS
  • K. Knill, M. Gales, S. Young
  • Computer Science
  • Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
  • 68
  • PDF
The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians
  • J. Fritsch, I. Rogina
  • Mathematics, Computer Science
  • 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
  • 48
  • PDF
...
1
2
...