A staged carry-save-adder array for Montgomery modular multiplication

  title={A staged carry-save-adder array for Montgomery modular multiplication},
  author={Jhing-Fa Wang and Po-Chuan Lin and Ping Kun Chiu},
  journal={Proceedings. IEEE Asia-Pacific Conference on ASIC,},
In this paper, an efficient VLSI architecture to compute the n-bit Montgomery modular multiplication is proposed. By using the staged carry save adder (CSA) array, the computation cycles of addition reduced by about 3n/8. In addition, we apply the switch unit to save 2Q-2 registers from the traditional Q-bit CSA. Compare with the original method, the total clock cycles can be reduced by 68% in the case of n=1024 and Q=512 bits. 

Figures and Tables from this paper

Improving Cryptographic Architectures by Adopting Efficient Adders in their Modular Multiplication Hardware VLSI
This work studies and compares different modular multiplication algorithms with emphases on the underlying binary adders using the carry-save adder, carry-lookahead adder and carry-skip adder to find a modular multiplier of fast speed with fair area requirement and reduced power consumption.
Fast Montgomery modular multiplication by pipelined CSA architecture
This paper uses carry save adder (CSA) architecture and shows that this architecture has greater performance for FPGA design than other architectures, appropriate for RSA processors based on FPGAs.
Modified radix-2 Montgomery modular multiplication to make it faster and simpler
  • K. Manochehri, S. Pourmozafari
  • Computer Science
    International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II
  • 2005
A new algorithm based on Montgomery modular multiplication is presented that is more simple and faster than radix-2 algorithm that is used for implementing this new algorithm in both ASIC and FPGA technology.
An Efficient Reconfigurable Montgomery Multiplier Architecture for GF(n)
This appears to be the only reconfigurable architecture for Montgomery multiplication over Galois prime field GF(n) that employs carry-save addition and high order of flexibility, which allows easy reconfigurability for any operand length and low hardware complexity.
High-Performance VLSI Architecture for SCS Based Montgomery Modular Multiplication
A modified SCS based Montgomery modular multiplication (SCS-MM2) with a Reversible Carry Save Adder (RCSA) using peres gates is proposed so that the performance can be increased, and its simulation and synthesis results are presented.
Serial-serial finite field multiplication
In the new formulation, all inputs and outputs of the finite field multiplier are communicated in a serial manner, thus the name serial-serial and some implementation enhancements that can be applied to the resulting structures are presented.
Hardware and Software Implementations of RSA Encryption Using Fast Montgomery Modular Multiplication Prof . Kris Gaj George Mason University
This key-based algorithm relies heavily on integer multiplication to perform the data encryption or decryption, with the speed of the multiplication algorithm contributing heavily to the throughput performance of the RSA encryption algorithm.


A systolic linear array for modular multiplication
A novel systolic, linear-array modular multiplier is presented which ideally performs the algorithm of P.L. Montgomery (1985) and is suitable for the VLSI implementation of modular exponentiation which is a kernel operation used in many public-key cryptosystems such as RSA.
Montgomery modular exponentiation on reconfigurable hardware
  • Thomas Blum
  • Computer Science, Mathematics
    Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)
  • 1999
This contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs) and shows that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA.
Modular multiplication without trial division
A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.
Implementation of RSA cryptoprocessor based on Montgomery algorithm
A hardware implementation of a 1024-bit RSA cryptoprocessor suitable for smart IC cards is presented and it has been shown that the processor can encrypt 1024 bit message in less than 0.65 seconds.
Hardware implementation
  • W. Donath
  • Computer Science
    AFIPS '68 (Fall, part II)
  • 1968
The area of hardware implementation shall be (some-what arbitrarily) defined to include placement, wire routing, terminal assignment, and the interface to hardware fabrication devices. At this stage,
Computer arithmetic algorithms
The principles of the algorithms available for performing arithmetic operations in digital computers, described independently of specific implementation technology and within the same framework, are explained.
A method for obtaining digital signatures and public-key cryptosystems
An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key. This has two important
The art of computer programming. Vol.2: Seminumerical algorithms
This professional art of computer programming volume 2 seminumerical algorithms 3rd edition that has actually been written by is one of the best seller books in the world and is never late to read.
Computer Arithmetic
A method for obtaining digital signatures and public-key cryptosystcms”, Communications ofACM
  • 1978