Speculative completion for the design of high-performance asynchronous dynamic adders
@article{Nowick1997SpeculativeCF, title={Speculative completion for the design of high-performance asynchronous dynamic adders}, author={Steven M. Nowick and Kenneth Y. Yun and Ayoob E. Dooply and Peter A. Beerel}, journal={Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems}, year={1997}, pages={210-223} }
This paper presents an in-depth case study in high-performance asynchronous adder design. A recent method, called "speculative completion", is used. This method uses single-rail bundled datapaths but also allows early completion. Five new dynamic designs are presented for Brent-Kung and Carry-Bypass adders. Furthermore, two new architectures are introduced, which target (i) small number addition, and (ii) hybrid operation. Initial SPICE simulation and statistical analysis show performance…
Figures and Tables from this paper
108 Citations
DESIGN OF A LOW LATENCY ASYNCHRONOUS ADDER USING EARLY COMPLETION DETECTION
- Computer Science
- 2014
A new method for designing completion detection for asynchronous adders based on the property of a carrymerge tree for parallel-prefix adders where a generate bit at one level will have the same value as that in the previous level if there is no carry into the sequence of bits.
A General Design Methodology for Synchronous Early-Completion-Prediction Adders in Nano-CMOS DSP Architectures
- Computer ScienceVLSI Design
- 2013
This paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet and includes automatic definition of critical test patterns for postlayout verification.
Static window addition: A new paradigm for the design of variable latency adders
- Computer Science2011 IEEE 29th International Conference on Computer Design (ICCD)
- 2011
Speculative adders have attracted strong interest for achieving sublogarithmic delays by exploiting the tradeoffs between correctness and performance. Speculative adders also find use in the design…
Design of synchronous and asynchronous variable-latency pipelined multipliers
- Computer ScienceIEEE Trans. Very Large Scale Integr. Syst.
- 2001
The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core.
Implementation and performance analysis of variable latency adders
- Computer Science2013 IEEE International SOC Conference
- 2013
This paper presents the implementation and analysis of a method for the design of high performance asynchronous adders called “speculative completion” on six different 32 bit adders and indicates that speculative completion yields significant performance improvements.
Variable delay ripple carry adder with carry chain interrupt detection
- Computer ScienceProceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03.
- 2003
It is shown that high throughput can be achieved based on area- and routing-efficient ripple-carry adders with only marginal overhead and share a low AT-product with Brent-Kung adders but provide designers with totally different area/delay tradeoffs.
Adding Faster with Application Specific Early Termination
- Computer Science
- 2005
A methodology for improving the speed of high-speed adders that is able to adapt dynamically to application-specific and adder-specific behavior, resulting in a higher detection rate of fast additions and, consequently, a faster average-case speed for addition.
Architectural optimization for low-power nonpipelined asynchronous systems
- Computer ScienceIEEE Trans. Very Large Scale Integr. Syst.
- 1998
The optimization is targeted to nonpipelined computation, and two new sequencing controllers are introduced, which significantly increase the throughput of the entire system and can be traded for substantial system-wide power savings through application of voltage scaling.
High performance reliable variable latency carry select addition
- Computer Science2012 Design, Automation & Test in Europe Conference & Exhibition (DATE)
- 2012
An analytical model for the error rate of SCSA is developed to facilitate both design exploration and convergence and shows that on average, variable latency addition using SCSA-based speculative adders is 10% faster than the DesignWare adder with up to 43% area reduction.
An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder
- Computer Science2010 IEEE Symposium on Asynchronous Circuits and Systems
- 2010
We present the design and implementation of an asynchronous high-performance IEEE 754 compliant double precision floating-point adder (FPA). We provide a detailed breakdown of the power consumption…
References
SHOWING 1-10 OF 22 REFERENCES
Asynchronous datapaths and the design of an asynchronous adder
- Computer Science, MathematicsFormal Methods Syst. Des.
- 1992
A general method for designing delay-insensitive datapath circuits with emphasis on the formal derivation of a circuit from its specification is presented and a CMOS implementation of the adder is given.
A CMOS VLSI Implementation of an Asynchronous ALU
- Computer ScienceAsynchronous Design Methodologies
- 1993
An asynchronous implementation of an ARM ALU that exploits the fact that most operations can be completed quickly to discard the hardware typically used in an ALU to speed up a few worst-case operations; these operations are rare and so are allowed to take longer.
The counterflow pipeline processor architecture
- Computer ScienceIEEE Design & Test of Computers
- 1994
The CFPP architecture and a proposal for an asynchronous implementation are presented and the architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs.
A 500 MHz, 32 bit, 0.4 /spl mu/m CMOS RISC processor
- Physics, Engineering
- 1994
New circuit-integrating techniques include a stacked power-line structure, which serves as a noise shield and also provides low bounce, a low voltage-swing interface circuit with on-chip adjustable termination resistors, and a clock synchronization circuit which provides small-skew clock among LSI chips.
The NSR processor
- Computer Science[1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences
- 1993
The NSR processor is a general-purpose computer structured as a collection of self-timed blocks. These blocks operate concurrently and cooperate by communicating with other blocks using self-timed…
A micropipelined ARM
- Computer ScienceVLSI
- 1993
The feasibility of designing a full functionality commercial RISC architecture in asynchronous logic in a micropipelined style is demonstrated and the design does not outperform its clocked counterpart, but its performance is within a factor of two in all areas.
A Regular Layout for Parallel Adders
- Computer ScienceIEEE Transactions on Computers
- 1982
It is shown that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
Micropipelines
- Computer ScienceCACM
- 1989
The pipeline processor is a common paradigm for very high speed computing machinery that can be found in graphics processors, in signal processing devices, in integrated circuit components for doing arithmetic, and in the instruction interpretation units and arithmetic operations of general purpose computing machinery.
A low-power asynchronous data-path for a FIR filter bank
- Computer ScienceProceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems
- 1996
The paper addresses the design of a dedicated processor structure that implements an audio FIR filter bank which is part of an industrial application and includes a tagging scheme that divides the data-path into slices, and an asynchronous ripple carry adder that avoids a completion tree.
Asynchronous circuits for low power: a DCC error corrector
- Computer Science, PhysicsIEEE Design & Test of Computers
- 1994
The authors describe a complete low-power digital compact cassette error corrector. Using Tangram, a high-level programming language, they designed two asynchronous circuits that correct errors on…