Low-Precision Arithmetic for Fast Gaussian Processes

@article{Maddox2022LowPrecisionAF,
  title={Low-Precision Arithmetic for Fast Gaussian Processes},
  author={Wesley J. Maddox and Andres Potapczynski and Andrew Gordon Wilson},
  journal={ArXiv},
  year={2022},
  volume={abs/2207.06856}
}
Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian process (GP) training, largely because GPs require sophisticated linear algebra routines that are unsta-ble in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we… 

References

SHOWING 1-10 OF 53 REFERENCES

Investigating half precision arithmetic to accelerate dense linear system solvers

This work shows for a first time how the use of FP16 arithmetic can significantly accelerate, as well as make more energy efficient, FP32 or FP64-precision Ax = b solvers.

Deep Learning with Limited Numerical Precision

The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.

Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers

This investigation presents an investigation showing that other high-performance computing (HPC) applications can also harness this power of floating-point arithmetic, and shows how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup.

Exact Gaussian Processes on a Million Data Points

A scalable approach for exact GPs is developed that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication, and is generally applicable, without constraints to grid data or specific kernel classes.

SWALP : Stochastic Weight Averaging in Low-Precision Training

It is shown that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

This work proposes using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget, and shows quantization has a negligible effect on generalization performance in important settings.

Fast geometric learning with symbolic matrices

This paper presents an extension for standard machine learning frameworks that provides comprehensive support for this abstraction on CPUs and GPUs, and performs an extensive evaluation on a broad class of problems: Gaussian modelling, K-nearest neighbors search, geometric deep learning, nonEuclidean embeddings and optimal transport theory.

Dimension-Free Bounds for Low-Precision Training

New bounds for low-precision training algorithms that do not contain the dimension $d$ are derived, which lets us better understand what affects the convergence of these algorithms as parameters scale.

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

A quantization scheme is proposed that allows inference to be carried out using integer- only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware.

Revisiting BFloat16 Training

Two simple existing techniques, stochastic rounding and Kahan summation, are identified and empirically show that these two techniques can enable up to 7% absolute validation accuracy gain in pure 16-bit training.
...