# Montgomery Modular Multiplication Algorithm on Multi-Core Systems

@article{Fan2007MontgomeryMM, title={Montgomery Modular Multiplication Algorithm on Multi-Core Systems}, author={Junfeng Fan and Kazuo Sakiyama and Ingrid M. R. Verbauwhede}, journal={2007 IEEE Workshop on Signal Processing Systems}, year={2007}, pages={261-266} }

In this paper, we investigate the efficient software implementations of theMontgomery modular multiplication algorithm on amulti-core system. AHW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modular multiplication on a multi-core systemwith general purpose cores. We then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core. As a result, the performance can be improved by…

## Figures and Tables from this paper

## 39 Citations

Parallelization of Radix-2 Montgomery Multiplication on Multicore Platform

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2013

This brief presents an improved task partitioning of the Montgomery multiplication algorithm for the multicore platform with area-efficient processors to verify the efficiency of parallelization.

An Efficient Implementation of Montgomery Multiplication on Multicore Platform With Optimized Algorithm, Task Partitioning, and Network Architecture

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2014

A block-level parallel algorithm for MM with quotient pipelining and optimally map it on a network-on-chip-based multicore platform equipped with broadcasting mechanism to maximizes the speedup ratio with regard to given intercore communication latency.

Hardware Implementation of Improved Montgomery Modular Multiplication Algorithm

- Computer Science, Mathematics2009 WRI International Conference on Communications and Mobile Computing
- 2009

A hardware implementation of modular multiplication coprocessor for both RSA and ECC Cryptosystems using a self-improvement Montgomery modular multiplication algorithm, which completes a modular multiplication with less clock cycles under the equivalent circumstance of the other designs.

Highly-Parallel Montgomery Multiplication for Multi-Core General-Purpose Microprocessors

- Computer Science, MathematicsISCIS
- 2012

This work proposes a new parallel Montgomery multiplication algorithm which exhibits up to 39 % better performance than the known best serial Montgomery multiplication variant for the bit-lengths of 2048 or larger and is the first work that shows with actual implementation results that Montgomery multiplication can be practically and scalably parallelized on general-purpose multi-core processors.

pSHS: A scalable parallel software implementation of Montgomery multiplication for multicore systems

- Computer Science2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)
- 2010

Parallel programming techniques have become one of the great challenges in the transition from single-core to multicore architectures. In this paper, we investigate the parallelization of the…

A Parallel Implementation of Montgomery Multiplication on Multicore Systems: Algorithm, Analysis, and Prototype

- Computer ScienceIEEE Transactions on Computers
- 2011

This work presents a parallel-software implementation of the Montgomery multiplication for multicore systems, pSHS, and reveals that it is high performance, scalable over different number of cores, and stable when the communication latency changes.

The Researcher and Implement of High-Speed Modular Multiplication Algorithm Basing on Parallel Pipelining

- Computer Science2009 Asia-Pacific Conference on Information Processing
- 2009

This page presents an improving method which realizes parallel operation in cell arithmetic unit and between cell arithmetic units to improve the speed of Montgomery modular multiplication…

Efficient Translation of Algorithmic Kernels on Large-Scale Multi-cores

- Computer Science2009 International Conference on Computational Science and Engineering
- 2009

The design of a novelembedded processor architecture (which is called a μ-core) that makes use of a reconfigurable ALU that serves as the basis of custom 2-dimensional array architectures that can be used to accelerate algorithms such as cryptography and image processing.

Survey on Hardware Implementation of Montgomery Modular exponentiation

- Computer Science, Mathematics
- 2018

Three modified Montgomery algorithm discussed with their output compared with each other are Iterative architecture, Montgomery multiplier for faster Cryptography and Vedic multipliers used in Montgomery algorithm for multiplication.

Novel algorithms and hardware architectures for Montgomery Multiplication over GF(p)

- Computer ScienceIACR Cryptol. ePrint Arch.
- 2015

A novel digit-digit based MM algorithm is derived and two hardware architectures that compute that algorithm are described, making use of available dedicated multiplier and memory blocks reducing drastically the FPGA’s standard logic while keeping an acceptable performance compared with other implementation approaches.

## References

SHOWING 1-10 OF 33 REFERENCES

Efficient pipelining for modular multiplication architectures in prime fields

- Computer Science, MathematicsGLSVLSI '07
- 2007

A pipelined architecture of a modular Montgomery multiplier, which is suitable to be used in public key coprocessors and compares to the state-of-the-art in Montgomery multipliers on the basis of performance results for 1024-bit RSA.

A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm

- Computer ScienceIEEE Trans. Computers
- 2003

A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.

Montgomery in Practice: How to Do It More Efficiently in Hardware

- Computer ScienceCT-RSA
- 2002

This work presents modular exponentiation based on Montgomery's method without any modular reduction achieving the best possible bound according to C. Walter.

Parallelized Very High Radix Scalable Montgomery Multipliers

- Computer Science, MathematicsConference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005.
- 2005

A parallelized very high radix scalable Montgomery multiplier designed for non-redundant FPGA implementations that can perform 1024-bit modular exponentiation in 5.0 ms and 256- bit modular exponentation in 0.20 ms, improving the fastest scalable design yet reported.

A fast dual-field modular arithmetic logic unit and its hardware implementation

- Computer Science, Mathematics2006 IEEE International Symposium on Circuits and Systems
- 2006

A fast modular arithmetic logic unit (MALU) that is scalable in the digit size (d) and the field size (k) and well suited and very efficient for the modular multiplication and addition/subtraction which are the computational kernels of elliptic curve and hyperelliptic curve cryptography.

Architectural Enhancements to Support Digital Signal Processing and Public-Key Cryptography

- Computer ScienceWISES
- 2004

The analysis shows that the MIPS32 architecture can be easily extended for efficient cryptography processing and offers some advantages compared to the ARMv5TE architecture.

Hardware Implementation of Montgomery's Modular Multiplication Algorithm

- Computer Science, MathematicsIEEE Trans. Computers
- 1993

Hardware is described for implementing the fast modular multiplication algorithm developed by P.L. Montgomery (1985), showing that this algorithm is up to twice as fast as the best currently available and is more suitable for alternative architectures.

Montgomery modular exponentiation on reconfigurable hardware

- Computer Science, MathematicsProceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)
- 1999

This contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs) and shows that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA.

Analyzing and comparing Montgomery multiplication algorithms

- Computer Science, MathematicsIEEE Micro
- 1996

The operations involved in computing the Montgomery product are studied, several high-speed, space-efficient algorithms for computing MonPro(a, b), and their time and space requirements are described.

Modular exponentiation using parallel multipliers

- Computer Science, MathematicsProceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798)
- 2003

A field programmable gate array (FPGA) semi-systolic implementation of a modular exponentiation unit, suitable for use in implementing the RSA public key cryptosystem is presented. The design is…