Fast calculation of inverse square root with the use of magic constant - analytical approach

@article{Moroz2018FastCO,
  title={Fast calculation of inverse square root with the use of magic constant - analytical approach},
  author={Leonid V. Moroz and Cezary J. Walczyk and Andriy Hrynchyshyn and Vijay Holimath and Jan L. Cieslinski},
  journal={ArXiv},
  year={2018},
  volume={abs/1603.04483}
}
We present a mathematical analysis of transformations used in fast calculation of inverse square root for single-precision floating-point numbers. Optimal values of the so called magic constants are derived in a systematic way, minimizing either relative or absolute errors. We show that the value of the magic constant can depend on the number of NewtonRaphson iterations. We present results for one and two iterations. 
Improving the accuracy of the fast inverse square root algorithm
TLDR
Improved algorithms for fast calculation of the inverse square root for single-precision floating-point numbers are presented, modifying the Newton-Raphson method and demanding that the maximal error is as small as possible. Expand
A Modification of the Fast Inverse Square Root Algorithm
We present a new algorithm for the approximate evaluation of the inverse square root for single-precision floating-point numbers. This is a modification of the famous fast inverse square root code.Expand
Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections
TLDR
Improved algorithms for fast calculation of the inverse square root function for single-precision and double-pre precision floating-point numbers are presented and are much more accurate than the original fast inversesquare root algorithm and have similar very low computational costs. Expand
Fast Calculation of Cube and Inverse Cube Roots Using a Magic Constant and Its Implementation on Microcontrollers
We develop a bit manipulation technique for single precision floating point numbers which leads to new algorithms for fast computation of the cube root and inverse cube root. It uses the modifiedExpand
SIMPLE EFFECTIVE FAST INVERSE SQUARE ROOT ALGORITHM WITH TWO MAGIC CONSTANTS
TLDR
The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors and includes two magic constants in order to avoid one floating-point multiplication. Expand
Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants
TLDR
Algorithms are given in C/C++ for singleand double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions, based on the switching of magic constants in the initial approximation, depending on the input interval of the normalized floating-point numbers. Expand
Elementary Functions and Approximate Computing
  • J. Muller
  • Computer Science
  • Proceedings of the IEEE
  • 2020
TLDR
This article reviews some of the classical methods used for quickly obtaining low-precision approximations to the elementary functions and examines what can be done for obtaining very fast estimates of a function, at the cost of a (controlled) loss in terms of accuracy. Expand
A M odification of the F ast I nverse S quare R oot A lgorithm
We present an improved algorithm for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithm is much more accurate than the famous fast inverse squareExpand
Hardware Implementation of Single Iterated Multiplicative Inverse Square Root
Inverse square root has played an important role in Cholesky decomposition, which devoted to hardware efficient compressed sensing. However, the performance is usually limited by the trade-offExpand
Implementation of Parametric Haar-like Transformations on FPGA
TLDR
This master’s thesis is to study how the hardware architectures for parametric Haar-like transformations could be efficiently implemented as a part of a larger FPGA based system. Expand

References

SHOWING 1-10 OF 37 REFERENCES
Floating-point division and square root implementation using a Taylor-series expansion algorithm with reduced look-up tables
TLDR
The implementation results of the proposed fused unit based on standard cell methodology in IBM 90 nm technology exhibits that the incorporation of square root function to an existing multiply/divide unit requires only a modest 20% area increase and the same low latency for divide and square root operation can be achieved. Expand
Hardware-based algorithm for Sine and Cosine computations using fixed point processor
  • My X. Nguyen, A. Dinh-Duc
  • Mathematics
  • 2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)
  • 2014
This paper presents an algorithm for computing Sine and Cosine functions. The algorithm is developed for Fixed Point Processor. It only required fixed point multiplier 32×32 and fixed point adder. InExpand
Simplified floating-point division and square root
TLDR
Software operations are proposed which attain near-exact precision with twice the performance of exact algorithms and resolve overflow-related errors with inexpensive exponent-manipulation special instructions. Expand
Hardware architecture design and mapping of ‘Fast Inverse Square Root’ algorithm
  • Saad Zafar, Raviteja Adapa
  • Computer Science
  • 2014 International Conference on Advances in Electrical Engineering (ICAEE)
  • 2014
TLDR
This paper presents a hardware implementation of the Fast Inverse Square Root algorithm on an FPGA board by designing the complete architecture and successfully mapping it on Xilinx Spartan 3E after thorough functional verification. Expand
Software Implementation of Floating-Point Arithmetic
The previous chapter has presented the basic paradigms used for implementing floating-point arithmetic in hardware. However, some processors may not have such dedicated hardware, mainly for costExpand
Optimized low-power elementary function approximation for Chebyshev series approximations
  • M. Sadeghian, J. Stine
  • Mathematics, Computer Science
  • 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)
  • 2012
TLDR
It is demonstrated that optimal coefficient values with high precision and smaller lookup table sizes can be optimally compared to standard coefficients for interpolators. Expand
Floating-point tricks
The author discusses IEEE floating point representation that stores numbers in what amounts to scientific notation. He considers the sign bit, the logarithm function, function approximations, errorsExpand
Division and Square Root: Digit-Recurrence Algorithms and Implementations
1.General Comments. 2. Division by Digit Recurrence. 3. Theory of Digit-Recurrence Division. 4. Division with Scaling and Prediction. 5. Higher Radix Division. 6. On-the-Fly Conversion and Rounding.Expand
2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing
A vector unit for high-performance three-dimensional graphics computing has been developed. We implement four floating-point multiply-accumulate units, which execute multiply-add operations with oneExpand
A floating-point advanced cordic processor
TLDR
Its advanced functionality is achieved without significant increase in hardware, in comparison to ordinary CORDIC processor, and makes it an ideal processing element in high speed multiprocessor applications, e.g. real time Digital Signal Processing (DSP) and computer graphics. Expand
...
1
2
3
4
...