Speculative completion for the design of high-performance asynchronous dynamic adders

  title={Speculative completion for the design of high-performance asynchronous dynamic adders},
  author={Steven M. Nowick and Kenneth Y. Yun and Ayoob E. Dooply and Peter A. Beerel},
  journal={Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems},
  • S. Nowick, K. Yun, P. Beerel
  • Published 7 April 1997
  • Engineering
  • Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems
This paper presents an in-depth case study in high-performance asynchronous adder design. A recent method, called "speculative completion", is used. This method uses single-rail bundled datapaths but also allows early completion. Five new dynamic designs are presented for Brent-Kung and Carry-Bypass adders. Furthermore, two new architectures are introduced, which target (i) small number addition, and (ii) hybrid operation. Initial SPICE simulation and statistical analysis show performance… 
A new method for designing completion detection for asynchronous adders based on the property of a carrymerge tree for parallel-prefix adders where a generate bit at one level will have the same value as that in the previous level if there is no carry into the sequence of bits.
A General Design Methodology for Synchronous Early-Completion-Prediction Adders in Nano-CMOS DSP Architectures
This paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet and includes automatic definition of critical test patterns for postlayout verification.
Static window addition: A new paradigm for the design of variable latency adders
Speculative adders have attracted strong interest for achieving sublogarithmic delays by exploiting the tradeoffs between correctness and performance. Speculative adders also find use in the design
Design of synchronous and asynchronous variable-latency pipelined multipliers
  • M. Olivieri
  • Computer Science
    IEEE Trans. Very Large Scale Integr. Syst.
  • 2001
The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core.
Implementation and performance analysis of variable latency adders
This paper presents the implementation and analysis of a method for the design of high performance asynchronous adders called “speculative completion” on six different 32 bit adders and indicates that speculative completion yields significant performance improvements.
Variable delay ripple carry adder with carry chain interrupt detection
It is shown that high throughput can be achieved based on area- and routing-efficient ripple-carry adders with only marginal overhead and share a low AT-product with Brent-Kung adders but provide designers with totally different area/delay tradeoffs.
Adding Faster with Application Specific Early Termination
A methodology for improving the speed of high-speed adders that is able to adapt dynamically to application-specific and adder-specific behavior, resulting in a higher detection rate of fast additions and, consequently, a faster average-case speed for addition.
Architectural optimization for low-power nonpipelined asynchronous systems
The optimization is targeted to nonpipelined computation, and two new sequencing controllers are introduced, which significantly increase the throughput of the entire system and can be traded for substantial system-wide power savings through application of voltage scaling.
High performance reliable variable latency carry select addition
An analytical model for the error rate of SCSA is developed to facilitate both design exploration and convergence and shows that on average, variable latency addition using SCSA-based speculative adders is 10% faster than the DesignWare adder with up to 43% area reduction.
An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder
  • B. Sheikh, R. Manohar
  • Computer Science
    2010 IEEE Symposium on Asynchronous Circuits and Systems
  • 2010
We present the design and implementation of an asynchronous high-performance IEEE 754 compliant double precision floating-point adder (FPA). We provide a detailed breakdown of the power consumption


Asynchronous datapaths and the design of an asynchronous adder
A general method for designing delay-insensitive datapath circuits with emphasis on the formal derivation of a circuit from its specification is presented and a CMOS implementation of the adder is given.
A CMOS VLSI Implementation of an Asynchronous ALU
  • J. Garside
  • Computer Science
    Asynchronous Design Methodologies
  • 1993
An asynchronous implementation of an ARM ALU that exploits the fact that most operations can be completed quickly to discard the hardware typically used in an ALU to speed up a few worst-case operations; these operations are rare and so are allowed to take longer.
The counterflow pipeline processor architecture
The CFPP architecture and a proposal for an asynchronous implementation are presented and the architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs.
A 500 MHz, 32 bit, 0.4 /spl mu/m CMOS RISC processor
New circuit-integrating techniques include a stacked power-line structure, which serves as a noise shield and also provides low bounce, a low voltage-swing interface circuit with on-chip adjustable termination resistors, and a clock synchronization circuit which provides small-skew clock among LSI chips.
The NSR processor
  • E. Brunvand
  • Computer Science
    [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences
  • 1993
The NSR processor is a general-purpose computer structured as a collection of self-timed blocks. These blocks operate concurrently and cooperate by communicating with other blocks using self-timed
A micropipelined ARM
The feasibility of designing a full functionality commercial RISC architecture in asynchronous logic in a micropipelined style is demonstrated and the design does not outperform its clocked counterpart, but its performance is within a factor of two in all areas.
A Regular Layout for Parallel Adders
It is shown that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
The pipeline processor is a common paradigm for very high speed computing machinery that can be found in graphics processors, in signal processing devices, in integrated circuit components for doing arithmetic, and in the instruction interpretation units and arithmetic operations of general purpose computing machinery.
A low-power asynchronous data-path for a FIR filter bank
  • L. S. Nielsen, J. Sparsø
  • Computer Science
    Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems
  • 1996
The paper addresses the design of a dedicated processor structure that implements an audio FIR filter bank which is part of an industrial application and includes a tagging scheme that divides the data-path into slices, and an asynchronous ripple carry adder that avoids a completion tree.
Asynchronous circuits for low power: a DCC error corrector
The authors describe a complete low-power digital compact cassette error corrector. Using Tangram, a high-level programming language, they designed two asynchronous circuits that correct errors on