Coarsely integrated operand scanning (CIOS) architecture for high-speed Montgomery modular multiplication

@article{ONeill2004CoarselyIO,
  title={Coarsely integrated operand scanning (CIOS) architecture for high-speed Montgomery modular multiplication},
  author={M{\'a}ire O'Neill and Ciaran McIvor and John V. McCanny},
  journal={Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921)},
  year={2004},
  pages={185-191}
}
  • M. O'Neill, C. McIvor, J. McCanny
  • Published 6 December 2004
  • Computer Science, Mathematics
  • Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921)
A generic coarsely integrated operand scanning (CIOS) architecture that provides high speed Montgomery modular multiplication is presented in This work. The architecture is capable of supporting varying operand sizes. It achieves a throughput of 210 Mbps, 289 Mbps and 334 Mbps for 128-bit, 256-bit and 512-bit operand sizes respectively, when implemented on a Virtex XC2 VP50 FPGA. Throughputs of up to 400 Mbps are achieved if the final subtraction in the Montgomery algorithm is excluded. To the… 

Figures and Tables from this paper

Generation of Finely-Pipelined GF(PP) Multipliers for Flexible Curve Based Cryptography on FPGAs
TLDR
A tool, distributed as open source, for generating VHDL codes with various parameters: width of operands, number of logical multipliers per physical one, speed or area optimization, possible use of BRAMs, target FPGA.
Dual-Residue Montgomery Multiplication
TLDR
A new approach based on dual residue system to compute Montgomery multiplication with new transformation constant that can partial replace Montgomery multiplier used nowadays without any changes on top architecture is introduced.
Implementation and Evaluation of Modular Multiplication Based on Coarsely Integrated Operand Scanning
TLDR
This paper designs modular multiplication circuit using the proposed architecture at 4-radix, which allows a compact implementation on the FPGA with Spartan-6 XC6SLX45T and achieves 30 times faster than software processing.
aCIOSm4: An Asynchronous CIOS Algorithm
TLDR
A novel asynchronous CIOS algorithm and its asynchronous architecture that can accelerate the encryption and decryption of RSA cryptosystem without increasing area and reducing the number of operations by half is proposed.
Implementing modular arithmetic using OpenCL
TLDR
This project is a fast implementation of public key algorithms able to run in parallel on a variety of parallel devices that is capable to run OpenCL code/programs.
RNS in Cryptography
  • P. Mohan
  • Computer Science, Mathematics
  • 2016
TLDR
This chapter considers applications of RNS in Elliptic Curve Cryptography processors and for implementation of Pairing protocols and both RNS-based and non-RNS based implementations are considered.
A 1.96mm2 low-latency multi-mode crypto-coprocessor for PKC-based IoT security protocols
TLDR
This paper presents the implementation of a multi-mode crypto-coprocessor, which can support three different public-key cryptography (PKC) engines used in post-quantum and identity-based cryptosystems, and incorporates three design features, including a highly parallel arithmetic unit for cryptographic kernel operations.
Using Modular Extension to Provably Protect ECC Against Fault Attacks
TLDR
This paper study's the modular extension protection scheme in previously existing and newly contributed variants of the countermeasure on elliptic curve scalar multiplication (ECSM) algorithms, and formally proves the correctness and security of modular extension.
Formal sofwtare methods for cryptosystems implementation security
TLDR
This thesis aims to show that formal methods can be used to prove not only the principle of the countermeasures according to a model, but also their implementations, as it is where the physical vulnerabilities are exploited.
Side-Channel Attacks and Countermeasures for Identity-Based Cryptographic Algorithm SM9
TLDR
It is proved that if attackers try the template attack on an 8-bit microcontrol unit, the secret key can be revealed by enabling the device to execute one time, and some countermeasures to resist the three kinds of attacks above are given.
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
FPGA Montgomery multiplier architectures - a comparison
TLDR
Novel FPGA architectures for the SOS, CIOS and FIOS Montgomery multiplication algorithms are presented, and it is shown that one can tailor the multiplier architectures to be area efficient, time efficient or a mixture of both, by choosing a particular word size.
Montgomery modular multiplication architecture for public key cryptosystems
TLDR
A novel hardware architecture of the coarsely integrated hybrid scanning (CIHS) algorithm which performs Montgomery modular multiplication, and to the authors' knowledge, these are the first performance figures for a hardware CIHS algorithm architecture to be reported in the literature.
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
TLDR
A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.
Montgomery's Multiplication Technique: How to Make It Smaller and Faster
TLDR
It is concluded that a linear, pipelined implementation of the modular multiplication algorithm may be part of best policy in thwarting differential power attacks against RSA.
Hardware Implementation of Montgomery's Modular Multiplication Algorithm
TLDR
Hardware is described for implementing the fast modular multiplication algorithm developed by P.L. Montgomery (1985), showing that this algorithm is up to twice as fast as the best currently available and is more suitable for alternative architectures.
Montgomery modular exponentiation on reconfigurable hardware
  • Thomas Blum
  • Computer Science, Mathematics
    Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)
  • 1999
TLDR
This contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs) and shows that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA.
A Scalable GF(p) Elliptic Curve Processor Architecture for Programmable Hardware
This work proposes a new elliptic curve processor architecture for the computation of point multiplication for curves defined over fields GF(p). This is a scalable architecture in terms of area and
RSA Acceleration with Field Programmable Gate Arrays
An efficient implementations of modular exponentiation, i.e., the main building block in the RSA cryptographic scheme, is achieved by first designing a bit-level systolic array such that the whole
Analyzing and comparing Montgomery multiplication algorithms
TLDR
The operations involved in computing the Montgomery product are studied, several high-speed, space-efficient algorithms for computing MonPro(a, b), and their time and space requirements are described.
A Scalable Dual-Field Elliptic Curve Cryptographic Processor
We propose an elliptic curve (EC) cryptographic processor architecture that can support Galois fields GF(p) and GF(2/sup n/) for arbitrary prime numbers and irreducible polynomials by introducing a
...
1
2
3
...