High-Performance Modular Multiplication on the Cell Processor

@inproceedings{Bos2010HighPerformanceMM,
  title={High-Performance Modular Multiplication on the Cell Processor},
  author={Joppe W. Bos},
  booktitle={WAIFI},
  year={2010}
}
  • Joppe W. Bos
  • Published in WAIFI 27 June 2010
  • Computer Science, Mathematics
This paper presents software implementation speed records for modular multiplication arithmetic on the synergistic processing elements of the Cell broadband engine (Cell) architecture. The focus is on moduli which are of special interest in elliptic curve cryptography, that is, moduli of bit-lengths ranging from 192- to 521-bit. Finite field arithmetic using primes which allow particularly fast reduction is compared to Montgomery multiplication. The special primes considered are the five… 
Low-Latency Elliptic Curve Scalar Multiplication
  • Joppe W. Bos
  • Computer Science, Mathematics
    International Journal of Parallel Programming
  • 2012
TLDR
A low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis and can be applied to any parallel 32-bit architecture.
Efficient Modular Multiplication
TLDR
This chapter outlines the most commonly used modular multiplication method Montgomery multiplication for generic moduli as well as different techniques when “special” moduli of a particular shape are used and study approaches which might produce errors with a very small probability.
Montgomery Multiplication Using Vector Instructions
TLDR
A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.
Accelerating Integer Based Fully Homomorphic Encryption Using Frequency Domain Multiplication
TLDR
A new methodology is proposed to speed up the encryption process by optimizing the very large asymmetric multiplications required by adopting a frequency domain approach for the multiplication using the Number Theoretic Transforms.
PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA
TLDR
A vector-oriented Montgomery multiplication design based on vector carry propagation chain (VCPC) method to fully exploit the computing power of vector instructions on Intel Xeon Phi, which achieves high throughput comparable to those on GPUs but with much less parallel tasks, and small latency comparable to that on CPUs.
Efficient SIMD Arithmetic Modulo a Mersenne Number
This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall
Fast Cryptography in Genus 2
TLDR
A taxonomy of the best known techniques to realize genus 2-based cryptography, which includes fast formulas on the Kummer surface and efficient four-dimensional GLV decompositions, is given.
Fast Cryptography in Genus 2 ( Two is Greater than One )
TLDR
A taxonomy of the best known techniques to realize genus 2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.
Two is Greater than One
TLDR
A taxonomy of the best known techniques to realize genus-2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.
Fast Cryptography in Genus 2
TLDR
A taxonomy of the best known techniques to realize genus 2 based cryptography, which includes fast formulas on the Kummer surface and efficient 4-dimensional GLV decompositions, is given.
...
1
2
...

References

SHOWING 1-10 OF 59 REFERENCES
Fast Elliptic-Curve Cryptography on the Cell Broadband Engine
This paper is the first to investigate the power of the Cell Broadband Engine for state-of-the-art public-key cryptography. We present a high-speed implementation of elliptic-curve Diffie-Hellman
ECM on Graphics Cards
TLDR
This paper reports record-setting performance for the elliptic-curve method of integer factorization, using a new ECM implementation introduced in this paper that uses Edwards curves, relies on new parallel addition formulas, and is carefully tuned for the highly parallel GPU architecture.
The billion-mulmod-per-second PC
TLDR
This paper explains how to carry out more than one billion 192-bit modular multiplications per second on a 2000 personal computer.
Pollard Rho on the PlayStation 3
TLDR
This paper describes a high-performance PlayStation 3 implementation of the Pollard rho discrete logarithm algorithm on elliptic curves over prime fields and most of the implementation strategies apply to other large moduli as well.
The circuit design of the synergistic processor element of a CELL processor
TLDR
A 32b 4-way SIMD dual-issue synergistic processor element of a CELL processor is developed with 20.9 million transistors in 14.8mm/sup 2/ using a 90nm SOI technology to achieve a compact and power efficient design.
Modular multiplication without trial division
TLDR
A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.
Power efficient processor architecture and the cell processor
  • H. P. Hofstee
  • Computer Science
    11th International Symposium on High-Performance Computer Architecture
  • 2005
TLDR
The paper discusses some of the challenges microprocessor designers face and provides motivation for performance per transistor as a reasonable first-order metric for design efficiency, and alternate architectural choices and some of its limitations are discussed.
Software Implementation of the NIST Elliptic Curves Over Prime Fields
TLDR
This paper presents an extensive study of the software implementation on workstations of the NIST-recommended elliptic curves over prime fields in C and assembler on a Pentium II 400MHz workstation.
Analysis and optimization of elliptic-curve single-scalar multiplication
Let P be a point on an elliptic curve over a finite field of large characteristic. Exactly how many points 2P, 3P, 5P, 7 P, 9P, ... ,mP should be precomputed in a sliding-window computation of nP?
On the Security of 1024-bit RSA and 160-bit Elliptic Curve Cryptography
TLDR
It is concluded that for 1024-bit RSA the risk is small at least until the year 2014, and that 160-bit ECC may safely be used for much longer – with the current state of the art in cryptanalysis the authors would be surprised if a public effort can make a dent in 160- bit ECC by the year 2020.
...
1
2
3
4
5
...