# Parallelization of Radix-2 Montgomery Multiplication on Multicore Platform

@article{Han2013ParallelizationOR, title={Parallelization of Radix-2 Montgomery Multiplication on Multicore Platform}, author={Jun Han and Shuai Wang and Wei Huang and Zhiyi Yu and Xiaoyang Zeng}, journal={IEEE Transactions on Very Large Scale Integration (VLSI) Systems}, year={2013}, volume={21}, pages={2325-2330} }

Montgomery multiplication is the kernel operation in public key ciphers. Aiming at parallel implementation of Montgomery multiplication, this brief presents an improved task partitioning of the Montgomery multiplication algorithm for the multicore platform with area-efficient processors. Several multicore platforms are designed to verify the efficiency of parallelization. The fastest platform takes 3460 cycles to finish a 1024-b Montgomery multiplication, which is six times faster than a single…

## Figures, Tables, and Topics from this paper

## 26 Citations

An Efficient Implementation of Montgomery Multiplication on Multicore Platform With Optimized Algorithm, Task Partitioning, and Network Architecture

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2014

A block-level parallel algorithm for MM with quotient pipelining and optimally map it on a network-on-chip-based multicore platform equipped with broadcasting mechanism to maximizes the speedup ratio with regard to given intercore communication latency.

Parallelism exploitation of montgomery multiplication in RNS on NoC-based platform

- Computer Science2014 12th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)
- 2014

An efficient parallelization scheme is proposed to overcome the influence caused by communication latency and is shown to be more resistant to communication latency than the state of the art MM algorithm.

A Systolic Hardware Architecture of Montgomery Modular Multiplication for Public Key Cryptosystems

- 2017

The Montgomery modular multiplication is mostly used in the field public-key cryptosystems. This work presents how to relax the data dependency in conventional word-based algorithms to increase the…

Efficient VLSI Architecture for Montgomery Modular Multiplier

- 2017

Montgomery modular multiplication is used in cryptographic algorithms and digital signal processing application. The main objective is to reduce the delay and area of the Montgomery multipliers while…

A Heterogeneous Multicore Crypto-Processor With Flexible Long-Word-Length Computation

- Computer ScienceIEEE Transactions on Circuits and Systems I: Regular Papers
- 2015

The proposed multicore processor provides flexible and efficient computation for various forms of RSA and ECC algorithms, fulfilling low-latency or high-throughput requirements of different application scenarios, by using a heterogeneous multicore architecture.

VLSI Implementation of High Performance Montgomery Modular Multiplication for Crypto Graphical Application

- 2017

This paper proposes a simple and efficient Montgomery multiplication algorithm such that the low-cost and high-performance Montgomery modular multiplier can be implemented accordingly. Full -adder or…

VLSI ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION ALGORITHM BY USING PASTA ADDER

- 2017

In data transmission applications, the widely used public-key cryptosystem is a simple and efficient Montgomery multiplication algorithm such that the low-cost and highperformance. In which includes…

Efficient Area and Delay Profile Architecture of Asynchronous Parallel Self Timed Adder Based Montgomery Multiplication

- 2018

With the ongoing digital revolution and advances in high performance computing, powerful desktop computer systems are available to almost everybody at low cost. While there has always been a demand…

Enhanced Vlsi Architecture For Montgomery Modular Multiplication In Digital Filters

- 2016

The multiplier receives and outputs the data with binary representation and uses only one-level Carry Save Adder (CSA) to avoid the carry propagation at each addition operation. A famous approach to…

Low Power Montgomery Modular Multiplication Using Carry Save Adder

- Computer Science
- 2016

A mechanism that can detect and skip the unnecessary carry-save addition operations in the one-level CCSA architecture while maintaining the short critical path delay is developed and high throughput can be obtained.

## References

SHOWING 1-10 OF 11 REFERENCES

Montgomery Modular Multiplication Algorithm on Multi-Core Systems

- Computer Science2007 IEEE Workshop on Signal Processing Systems
- 2007

This paper first implements the Montgomery modular multiplication on a multi-core system with general purpose cores, and then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core.

A Parallel Implementation of Montgomery Multiplication on Multicore Systems: Algorithm, Analysis, and Prototype

- Computer ScienceIEEE Transactions on Computers
- 2011

This work presents a parallel-software implementation of the Montgomery multiplication for multicore systems, pSHS, and reveals that it is high performance, scalable over different number of cores, and stable when the communication latency changes.

Analyzing and comparing Montgomery multiplication algorithms

- Computer ScienceIEEE Micro
- 1996

The operations involved in computing the Montgomery product are studied, several high-speed, space-efficient algorithms for computing MonPro(a, b), and their time and space requirements are described.

A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm

- Computer ScienceIEEE Trans. Computers
- 2003

A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.

A low-complexity heterogeneous multi-core platform for security soc

- Computer Science2010 IEEE Asian Solid-State Circuits Conference
- 2010

Comparison results shows that this heterogeneous multi-core SoC platform to deal with intensive cryptography algorithms in different security protocols also has a low-complexity hardware cost but more flexibility.

Modular multiplication without trial division

- Mathematics
- 1985

Let N > 1. We present a method for multiplying two integers (called N-residues) modulo N while avoiding division by N. N-residues are represented in a nonstandard way, so this method is useful only…

Challenges of programming multi-core microprocessors

- Computer Science
- 2008

It is claimed that many of the programming abstractions for parallel program have been honed for the developed of closed world software like operating system kernels and are not suitable for application development in a modular manner.

Test power reduction with multiple capture orders

- Engineering, Computer Science13th Asian Test Symposium
- 2004

A multiple-capture-orders method is developed to guarantee the full scan fault coverage and a test architecture based on a ring control structure is adopted which makes the test control very simple and requires very low area overhead.

Fast and accurate protocol specific bus modeling using TLM 2.0

- Computer Science2009 Design, Automation & Test in Europe Conference & Exhibition
- 2009

A new methodology is introduced that enables the creation of fast and cycle accurate protocol specific bus-based communication models, based on the new TLM 2.0 standard from the Open SystemC Initiative (OSCI).

Combining Behavioural Real-time Software Modelling with the OSCI TLM-2.0 Communication Standard

- Computer Science2010 10th IEEE International Conference on Computer and Information Technology
- 2010

A software Processing Element (PE) model is implemented which effectively integrates mixed timing RTOS-centric software models, abstract processor hardware functions, and OSCI TLM-2.0 communication interfaces.