What every computer scientist should know about floating-point arithmetic

  title={What every computer scientist should know about floating-point arithmetic},
  author={David Goldberg},
  journal={ACM Comput. Surv.},
  • D. Goldberg
  • Published 1 March 1991
  • Computer Science
  • ACM Comput. Surv.
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a… 

Figures and Tables from this paper

A Family of Variable-Precision Interval Arithmetic Processors

Hardware designs, arithmetic algorithms, and software support for a family of variable-precision, interval arithmetic processors that give the programmer the ability to detect and, if desired, to correct implicit errors in finite precision numerical computations.

Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy

The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.

Practical Floating-Point Tests with Integer Code

This work proposes a software floating-point emulation extension for symbolic execution of binary programs, and has five distinct open source soft floating- point code bases.

The Effects of Numerical Precision In Scientific Applications

This work examines via software emulation the effects that different arithmetic formats and numerical precisions have in a wide variety of scientific applications and reveals that, under the same bitwidth, posit arithmetic provides up to two orders of magnitude less error than floating-point format.

On the Design and Implementation of SmartFloat and AffineFloat

This work presents a library solution for rigorous arithmetic computation that tracks a (double) floating point value, but also a guaranteed upper bound on the error between this value and the ideal value that would be computed in the real-value semantics.

Self-similar module for FP/LNS arithmetic in high-performance FPGA systems

A conjunctive notation is used, also known as DIGILOG, to introduce a flexible means in creating configurable arithmetic of arbitrary order using a single module type that allows the Mitrion hardware compiler to match the hardware closer to the demands of the specific algorithm.

A floating-point library for integer processors

A C library for the software support of single precision floating-point (FP) arithmetic on processors without FP hardware units such as VLIW or DSP processor cores for embedded applications is presented.

A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic

Two concurrent libraries of parameterized arithmetic operators, targeting recent field-programmable gate arrays, are presented, and are unbiased in the sense that they strive to reflect the state-of-the-art for both number systems.

Multiplications of floating point expansions

  • M. Daumas
  • Computer Science
    Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)
  • 1999
Three multiplication algorithms, faster and more integrated than the stepwise algorithm proposed earlier, are presented, which have been tested on an application that computes the determinant of a matrix.

A parallel/vectorized double-precision exponential core to accelerate computational science applications

This paper presents a direct implementation of an IEEE 754 double-precision e-χx FPGA core modified to specifically support exponentiation, which is pipelined and parallel to efficiently handle large vectors of parameters.



Compiler support for floating‐point computation

  • C. Farnum
  • Computer Science
    Softw. Pract. Exp.
  • 1988
Predictability is a basic requirement for compilers of floating‐point code—it must be possible to determine the exact floating‐point operations that will be executed for a particular source‐level

Floating-point computation

(a) Write a function in a programming language of your choice that takes a (32-bit IEEE format) float and returns a float with the property that: given zero, infinity or a positive normalised

Floating-point standards—theory and practice

Arithmetic for vector processors

In the first part of this paper circuits which allow a fast and correct computation of sums and scalar products making use of a matrix shaped arrangement of adders and pipeline technology are discussed.

The arithmetic of the digital computer: A new approach

Computer arithmetic is extended so that the arithmetic operations in the linear spaces and their interval correspondents which are most commonly used in computation can be performed with maximum accuracy on digital computers.

A floating-point technique for extending the available precision

A technique is described for expressing multilength floating-point arithmetic in terms of singlelength floating point arithmetic, i.e. the arithmetic for an available (say: single or double

Contributions to a proposed standard for binary floating-point arithmetic (computer arithmetic)

This thesis consists of a set of "footnotes" to the proposed standard for binary floating-point arithmetic, which include an analysis of gradual underflow, the most controversial feature of the standard.

The C Programming Language

This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.), and is a "must-have" reference for every serious programmer's digital library.

Finite Precision Rational Arithmetic: Slash Number Systems

Multitiered precision hierarchies of both the fixed-Slash and floating-slash type are described and analyzed with regards to their support of both exact rational and approximate real computation.

Anomalies in the IBM ACRITH package

  • W. KahanE. LeBlanc
  • Computer Science
    1985 IEEE 7th Symposium on Computer Arithmetic (ARITH)
  • 1985
It is concluded that different techniques than used by ACRITH might have been about as accurate and yet more economical, robust and perspicuous.