A transprecision floating-point platform for ultra-low power computing

@article{Tagliavini2018ATF,
  title={A transprecision floating-point platform for ultra-low power computing},
  author={Giuseppe Tagliavini and Stefan Mach and Davide Rossi and Andrea Marongiu and Luca Benini},
  journal={2018 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
  year={2018},
  pages={1051-1056}
}
In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. [...] Key Method First, we introduce a software library that enables exploration of FP types by tuning both precision and dynamic range of program variables.Expand
A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
TLDR
An FP arithmetic unit capable of performing basic operations on smallFloat formats as well as conversions is presented, enabling hardware-supported power savings for applications making use of transprecision. Expand
FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing
TLDR
FPnew is presented, a highly configurable open-source transprecision floating-point unit (TP-FPU), capable of supporting a wide range of standard and custom FP formats, and integrated into a 64-bit RISC-V core, supporting five FP formats on scalars or 2, 4, or 8-way SIMD vectors. Expand
Synthesis Time Reconfigurable Floating Point Unit for Transprecision Computing
TLDR
The design and the implementation of a fully combinatorial floating point unit (FPU) that can be reconfigured at implementation time in order to use an arbitrary number of bits for the mantissa and exponent fields is presented, exploring the trade-off between precision, dynamic range and physical resources. Expand
Anytime instructions for programmable accuracy floating-point arithmetic
TLDR
A novel concept called anytime instructions, which explicitly specify the number of result bits that are calculated at full precision, is presented and applied to floating-point division by presenting an anytime division functional unit that is implemented in a VLIW processor. Expand
TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE
TLDR
An ultra-low-power tunable-precision CGRA architectural template, called TRANSprecision floating-point Programmable archItectuRE (TRANSPIRE), and its associated compilation flow supporting both integer and FP operations are proposed. Expand
Towards a Transprecision Polymorphic Floating-Point Unit for Mixed-Precision Computing
  • Alisson Carvalho, R. Azevedo
  • Computer Science
  • 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
  • 2019
TLDR
This work presents a new floating-point unit design, able to automatically decide when an instruction should be executed using less precision, without recompilation or user direct intervention, and may increase instruction level parallelism and cut down the necessity for type casting operations. Expand
FixM: Code generation of fixed point mathematical functions
TLDR
A new mathematical function library is developed, which is parameterizable at compile-time depending on the data type and works natively in the fixed point numeric representation and through modification of a compiler pass, the parameterized implementations of these trigonometric functions are inserted into the program seamlessly during the precision tuning process. Expand
A transprecision floating-point cluster for efficient near-sensor data analytics
TLDR
This paper proposes a multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget, based on the open-source RISC-V architecture. Expand
Integration and experimentation with a transprecision FPU
Mobile and high-performance computing have placed increasing and diverging constraints on floating-point hardware. At the same time, image processing and machine learning are becoming more and moreExpand
FlexFloat: A Software Library for Transprecision Computing
TLDR
FlexFloat is introduced, an open-source software library that has been expressly designed to aid the development of transprecision applications that allows to emulate the behavior of standard IEEE FP types and custom extensions for reduced-precision computation. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
Reducing power by optimizing the necessary precision/range of floating-point arithmetic
TLDR
This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data by showing that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy. Expand
Auto-tuning for floating-point precision with Discrete Stochastic Arithmetic
TLDR
This paper presents PROMISE, a tool that makes it possible to optimize the numerical types in a program by taking into account the requested accuracy on the computed results, and has been successfully tested on programs implementing several numerical algorithms. Expand
An Extended Shared Logarithmic Unit for Nonlinear Function Kernel Acceleration in a 65-nm CMOS Multicore Cluster
TLDR
A series of compact LNUs is developed, which provide significantly more functionality (such as transcendental functions) than other state-of-the-art designs, and measurement results demonstrate that the shared-LNU design can be up to 4.1× more energy-efficient in common nonlinear processing kernels, compared with a similar area design with four private FPUs. Expand
A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS
TLDR
Simultaneous floating-point certainty tracking, preshifted addends, a combined rounding and negation incrementer, efficient reuse of mantissa datapath for multiple parallel lower precision calculations, robust ultra-low voltage circuits, and fine-grained clock gating enable nominal energy efficiency of 52GFLOPS/W. Expand
Rigorous floating-point mixed-precision tuning
TLDR
This work presents a rigorous approach to precision allocation based on formal analysis via Symbolic Taylor Expansions, and error analysis based on interval functions, implemented in an automated tool called FPTuner that generates and solves a quadratically constrained quadratic program to obtain a precision-annotated version of the given expression. Expand
MPFR: A multiple-precision binary floating-point library with correct rounding
This article presents a multiple-precision binary floating-point library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend to arbitrary-precision, ideasExpand
Towards general purpose computations on low-end mobile GPUs
TLDR
This paper shows how obstacles can be overcome, in order to achieve general purpose programmability of mobile GPUs, and implemented it on a real embedded platform based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2× over the CPU. Expand
Efficient floating point precision tuning for approximate computing
TLDR
An automatic tool-chain that efficiently computes the precision of floating point variables down to the bit level of the mantissa is presented, which successfully used to transform floating point signal processing programs to their arbitrary precision fixed-point equivalent. Expand
Precimonious: Tuning assistant for floating-point precision
TLDR
Premonious is a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs and recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. Expand
Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices
TLDR
This paper describes the design of an open-source RISC-V processor core specifically designed for NT operation in tightly coupled multicore clusters, and introduces instruction extensions and microarchitectural optimizations to increase the computational density and to minimize the pressure toward the shared-memory hierarchy. Expand
...
1
2
...