Montgomery Arithmetic from a Software Perspective

@article{Bos2017MontgomeryAF,
  title={Montgomery Arithmetic from a Software Perspective},
  author={Joppe W. Bos and Peter L. Montgomery},
  journal={IACR Cryptol. ePrint Arch.},
  year={2017},
  volume={2017},
  pages={1057}
}
This chapter describes Peter L. Montgomery’s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units. We propose a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms. Peter L. Montgomery [55] 

Figures and Tables from this paper

Hardware Aspects of Montgomery Modular Multiplication
  • C. D. Walter
  • Mathematics, Computer Science
    IACR Cryptol. ePrint Arch.
  • 2017
This chapter compares Peter Montgomery's modular multiplication method with traditional techniques for suitability on hardware platforms. It also covers systolic array implementations and side
Combining Montgomery Multiplication with Tag Tracing for the Pollard's Rho Algorithm in Prime Order Fields
TLDR
This paper shows how to apply Montgomery multiplication to the tag tracing variant of the Pollard's rho algorithm applied to prime order fields, which eliminates costly modular reductions and replaces these with much more efficient divisions by a suitable power of two.
N ov 2 02 0 Speeding up decimal multiplication
TLDR
This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x—5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place 2n×n or n×2n matrix transposition.
Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings and Supersingular Isogeny-Based Protocols
  • P. Longa
  • Computer Science, Mathematics
    IACR Cryptol. ePrint Arch.
  • 2022
TLDR
The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of “double-precision” operations, and can be easily adapted to the methods that exist to compute modular multiplication.
Montgomery-friendly primes and applications to cryptography
TLDR
It is shown that, for dedicated architectures with word operators, larger R NS bases with Montgomery-friendly pairwise co-primes than the RNS bases generally used in the literature with Pseudo-Mersenne numbers can be reached.
Multiprecision ANSI C Library for Implementation of Cryptographic Algorithms on Microcontrollers
TLDR
This work presents a novel ANSI C library that implements also some less common operations like, e.g., multiprecision integer division and has been tested on ARM M4-based microcontroller Microchip CEC1302.
Cryptographic Program Obfuscation: Practical Solutions and Application-Driven Models
TLDR
This chapter provides a brief summary of the state of the art in cryptographic program obfuscation, focusing on two main aspects: first, there are many implementations of point function obfuscators, satisfying different obfuscation notions, and many of them can be used with practical performance guarantees; second, multiple application-driven obfuscation models and problems can be generated.
B-SIDH: supersingular isogeny Diffie-Hellman using twisted torsion
  • Craig Costello
  • Mathematics, Computer Science
    IACR Cryptol. ePrint Arch.
  • 2019
TLDR
This framework lifts the restrictions on the shapes of the underlying prime fields originally imposed by Jao and De Feo, and allows a range of new options for instantiating isogeny-based public key cryptography, including alternatives that exploit Mersenne and Montgomeryfriendly primes.
Speeding up decimal multiplication
TLDR
This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x-5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place matrix transposition.

References

SHOWING 1-10 OF 79 REFERENCES
Montgomery Multiplication Using Vector Instructions
TLDR
A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.
Efficient SIMD Arithmetic Modulo a Mersenne Number
This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall
Systolic Modular Multiplication
  • C. D. Walter
  • Computer Science, Mathematics
    IEEE Trans. Computers
  • 1993
TLDR
A systolic array for modular multiplication is presented using the ideally suited algorithm of P.L. Montgomery (1985), where its main use would be where many consecutive multiplications are done, as in RSA cryptosystems.
Montgomery Multiplication on the Cell
TLDR
A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed, which consists of splitting a number into four consecutive parts, representing columns in a 4-SIMD organization.
Long Modular Multiplication for Cryptographic Applications
  • L. Hars
  • Computer Science, Mathematics
    CHES
  • 2004
TLDR
Several new “column-sum” variants of popular quadratic time modular multiplication algorithms are presented, which are faster than the traditional implementations, need no or very little memory beyond the operand storage and perform squaring about twice faster than general multiplications or modular reductions.
A Cryptographic Library for the Motorola DSP56000
TLDR
A cryptographic library for the Motorola DSP56000 that provides hardware speed yet software flexibility, and an algorithm for modular multiplication that interleaves multiplication with Montgomery modular reduction to give a very fast implementation of RSA.
Montgomery Exponentiation with no Final Subtractions: Improved Results
TLDR
This paper proposes an improved (faster) version of the Montgomery multiplication and provides figures about the overhead of these versions relatively to a speed optimised version (theoretically and experimentally).
Modulo Reduction in Residue Number Systems
TLDR
This paper shows a new combination of residue number systems with efficient modulo reduction methods, and two methods are compared, and the faster one is scrutinized in detail.
Parallel cryptographic arithmetic using a redundant Montgomery representation
  • D. Page, N. Smart
  • Computer Science, Mathematics
    IEEE Transactions on Computers
  • 2004
TLDR
It is shown that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code, especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels.
Montgomery Modular Multiplication on ARM-NEON Revisited
TLDR
The Cascade Operand Scanning (COS) method is introduced to speed up multi-precision multiplication on SIMD architectures and it is shown that two COS computations can be “coarsely” integrated into an efficient vectorized variant of Montgomery modular multiplication, which the paper calls CICOS method.
...
1
2
3
4
5
...