# Montgomery Multiplication on the Cell

@inproceedings{Bos2009MontgomeryMO, title={Montgomery Multiplication on the Cell}, author={Joppe W. Bos and Marcelo E. Kaihara}, booktitle={PPAM}, year={2009} }

A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be performed in a 4-SIMD fashion. An implementation of the Montgomery multiplication using this technique…

## 11 Citations

Montgomery Multiplication Using Vector Instructions

- Computer ScienceSelected Areas in Cryptography
- 2013

A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.

Montgomery multiplication using CUDA

- Computer Science, MathematicsACM Southeast Regional Conference
- 2014

This paper implements a highly optimized systolic Montgomery multiplication algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs, and shows that this version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.

Montgomery Modular Multiplication on ARM-NEON Revisited

- Computer Science, MathematicsICISC
- 2014

The Cascade Operand Scanning (COS) method is introduced to speed up multi-precision multiplication on SIMD architectures and it is shown that two COS computations can be “coarsely” integrated into an efficient vectorized variant of Montgomery modular multiplication, which the paper calls CICOS method.

Montgomery Arithmetic from a Software Perspective

- Computer ScienceIACR Cryptol. ePrint Arch.
- 2017

This chapter describes Peter L. Montgomery’s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many…

Faster ECC over \mathbb F_2^521-1 F 2 521 - 1 (feat. NEON)

- Computer Science, MathematicsICISC
- 2015

High speed parallel multiplication and squaring algorithms for the Mersenne prime \(2^{521}-1\) are presented in order to provide asymptotically faster integer multiplication and fast reduction algorithms.

Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation

- Computer Science, MathematicsSecur. Commun. Networks
- 2015

A novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD architecture, compatible with separated Montgomery algorithms and highly efficient for RSA crypto system is introduced.

Pollard Rho on the PlayStation 3

- Computer Science, Mathematics
- 2009

This paper describes a high-performance PlayStation 3 implementation of the Pollard rho discrete logarithm algorithm on elliptic curves over prime fields and most of the implementation strategies apply to other large moduli as well.

PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA

- Computer ScienceSAC
- 2016

A vector-oriented Montgomery multiplication design based on vector carry propagation chain (VCPC) method to fully exploit the computing power of vector instructions on Intel Xeon Phi, which achieves high throughput comparable to those on GPUs but with much less parallel tasks, and small latency comparable to that on CPUs.

Investigating large integer arithmetic on Intel Xeon Phi SIMD extensions

- Computer Science, Mathematics2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS)
- 2014

Preliminary results indicate that the Knights Corner SIMD speedup of large integer multiplication is limited by the absence of specific instructions that typically appear in common SIMD architectures, but emulation on Knights Landing shows that large integers can indeed benefit by the presence of 512-bit vectors, for commonly used 1024- and 2048-bit operands, compared to publicly available large arithmetic libraries.

On the Cryptanalysis of Public-Key Cryptography

- Computer Science, Mathematics
- 2012

The elliptic curve method (ECM) for integer factorization is the asymptotically fastest method to find relatively small factors of large integers and the performance of ECM gives information about secure parameter choices of some cryptographic protocols.

## References

SHOWING 1-10 OF 18 REFERENCES

Multi-Stream Hashing on the PlayStation 3

- Computer Science
- 2008

This work presents high-performance multi-stream versions of cryptographic hash functions from the MD/SHA-family, which can be useful for cryptanalytic use as well as for utilizing the SPEs as cryptographic accelerators.

Accelerating SSL using the Vector processors in IBM's Cell Broadband Engine for Sony's Playstation 3

- Computer ScienceIACR Cryptol. ePrint Arch.
- 2007

This paper explores the implementation and performance gains when using the vector processing capabilities for SSL and shows that big improvements are still possible with the hardware designed primarily for other purposes.

Pollard Rho on the PlayStation 3

- Computer Science, Mathematics
- 2009

This paper describes a high-performance PlayStation 3 implementation of the Pollard rho discrete logarithm algorithm on elliptic curves over prime fields and most of the implementation strategies apply to other large moduli as well.

Modular multiplication without trial division

- Mathematics, Computer Science
- 1985

A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.

Fast Elliptic-Curve Cryptography on the Cell Broadband Engine

- Computer Science, MathematicsAFRICACRYPT
- 2009

This paper is the first to investigate the power of the Cell Broadband Engine for state-of-the-art public-key cryptography. We present a high-speed implementation of elliptic-curve Diffie-Hellman…

Power efficient processor architecture and the cell processor

- Computer Science11th International Symposium on High-Performance Computer Architecture
- 2005

The paper discusses some of the challenges microprocessor designers face and provides motivation for performance per transistor as a reasonable first-order metric for design efficiency, and alternate architectural choices and some of its limitations are discussed.

Fast Implementations of AES on Various Platforms

- Computer ScienceIACR Cryptol. ePrint Arch.
- 2009

This paper presents new software speed records for encryption and decryption using the block cipher AES-128 for different architectures, and this is the first AES implementation for the GPU which implements both encryption andDecryption.

Montgomery exponentiation needs no final subtractions

- Mathematics, Computer Science
- 1999

Montgomery's modular multiplication algorithm is commonly used in implementations of the RSA cryptosystem. It has been observed that there is no need for extra cleaning up at the end of an…

Short Chosen-Prefix Collisions for MD5 and the Creation of a Rogue CA Certificate

- Computer ScienceCRYPTO
- 2009

A more flexible family of differential paths and a new variable birthdaying search space are described, leading to just three pairs of near-collision blocks to generate the collision, enabling construction of RSA moduli that are sufficiently short to be accepted by current CAs.

Advances in Cryptology — CRYPTO ’96

- Computer Science, MathematicsLecture Notes in Computer Science
- 2001

This work presents new, simple, and practical constructions of message authentication schemes based on a cryptographic hash function, and proves that NMAC and HMAC are proven to be secure as long as the underlying hash function has some reasonable cryptographic strengths.