# High-Performance Modular Multiplication on the Cell Processor

@inproceedings{Bos2010HighPerformanceMM, title={High-Performance Modular Multiplication on the Cell Processor}, author={Joppe W. Bos}, booktitle={WAIFI}, year={2010} }

This paper presents software implementation speed records for modular multiplication arithmetic on the synergistic processing elements of the Cell broadband engine (Cell) architecture. The focus is on moduli which are of special interest in elliptic curve cryptography, that is, moduli of bit-lengths ranging from 192- to 521-bit. Finite field arithmetic using primes which allow particularly fast reduction is compared to Montgomery multiplication. The special primes considered are the five…

## 18 Citations

Low-Latency Elliptic Curve Scalar Multiplication

- Computer Science, MathematicsInternational Journal of Parallel Programming
- 2012

A low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis and can be applied to any parallel 32-bit architecture.

Efficient Modular Multiplication

- Mathematics, Computer ScienceIACR Cryptol. ePrint Arch.
- 2021

This chapter outlines the most commonly used modular multiplication method Montgomery multiplication for generic moduli as well as different techniques when “special” moduli of a particular shape are used and study approaches which might produce errors with a very small probability.

Montgomery Multiplication Using Vector Instructions

- Computer ScienceSelected Areas in Cryptography
- 2013

A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.

Accelerating Integer Based Fully Homomorphic Encryption Using Frequency Domain Multiplication

- Computer Science, MathematicsICICS
- 2018

A new methodology is proposed to speed up the encryption process by optimizing the very large asymmetric multiplications required by adopting a frequency domain approach for the multiplication using the Number Theoretic Transforms.

PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA

- Computer ScienceSAC
- 2016

A vector-oriented Montgomery multiplication design based on vector carry propagation chain (VCPC) method to fully exploit the computing power of vector instructions on Intel Xeon Phi, which achieves high throughput comparable to those on GPUs but with much less parallel tasks, and small latency comparable to that on CPUs.

Efficient SIMD Arithmetic Modulo a Mersenne Number

- Computer Science, Mathematics2011 IEEE 20th Symposium on Computer Arithmetic
- 2011

This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall…

Fast Cryptography in Genus 2

- Computer Science, MathematicsJournal of Cryptology
- 2014

A taxonomy of the best known techniques to realize genus 2-based cryptography, which includes fast formulas on the Kummer surface and efficient four-dimensional GLV decompositions, is given.

Fast Cryptography in Genus 2 ( Two is Greater than One )

- Computer Science, Mathematics
- 2013

A taxonomy of the best known techniques to realize genus 2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.

Two is Greater than One

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2012

A taxonomy of the best known techniques to realize genus-2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.

Fast Cryptography in Genus 2

- Computer Science, MathematicsEUROCRYPT
- 2013

A taxonomy of the best known techniques to realize genus 2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.

## References

SHOWING 1-10 OF 59 REFERENCES

Fast Elliptic-Curve Cryptography on the Cell Broadband Engine

- Computer Science, MathematicsAFRICACRYPT
- 2009

This paper is the first to investigate the power of the Cell Broadband Engine for state-of-the-art public-key cryptography. We present a high-speed implementation of elliptic-curve Diffie-Hellman…

ECM on Graphics Cards

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2008

This paper reports record-setting performance for the elliptic-curve method of integer factorization, using a new ECM implementation introduced in this paper that uses Edwards curves, relies on new parallel addition formulas, and is carefully tuned for the highly parallel GPU architecture.

The billion-mulmod-per-second PC

- Computer Science
- 2009

This paper explains how to carry out more than one billion 192-bit modular multiplications per second on a 2000 personal computer.

Pollard Rho on the PlayStation 3

- Computer Science, Mathematics
- 2009

This paper describes a high-performance PlayStation 3 implementation of the Pollard rho discrete logarithm algorithm on elliptic curves over prime fields and most of the implementation strategies apply to other large moduli as well.

The circuit design of the synergistic processor element of a CELL processor

- EngineeringICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005.
- 2005

A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology to achieve a compact and power efficient design.

Modular multiplication without trial division

- Mathematics, Computer Science
- 1985

A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.

Power efficient processor architecture and the cell processor

- Computer Science11th International Symposium on High-Performance Computer Architecture
- 2005

The paper discusses some of the challenges microprocessor designers face and provides motivation for performance per transistor as a reasonable first-order metric for design efficiency, and alternate architectural choices and some of its limitations are discussed.

Software Implementation of the NIST Elliptic Curves Over Prime Fields

- Computer Science, MathematicsCT-RSA
- 2001

This paper presents an extensive study of the software implementation on workstations of the NIST-recommended elliptic curves over prime fields in C and assembler on a Pentium II 400MHz workstation.

Analysis and optimization of elliptic-curve single-scalar multiplication

- MathematicsIACR Cryptol. ePrint Arch.
- 2007

Let P be a point on an elliptic curve over a finite field of large characteristic. Exactly how many points 2P, 3P, 5P, 7 P, 9P, ... ,mP should be precomputed in a sliding-window computation of nP?…

On the Security of 1024-bit RSA and 160-bit Elliptic Curve Cryptography

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2009

It is concluded that for 1024-bit RSA the risk is small at least until the year 2014, and that 160-bit ECC may safely be used for much longer – with the current state of the art in cryptanalysis the authors would be surprised if a public effort can make a dent in 160- bit ECC by the year 2020.