Montgomery Arithmetic from a Software Perspective

  title={Montgomery Arithmetic from a Software Perspective},
  author={Joppe W. Bos and Peter L. Montgomery},
  journal={IACR Cryptol. ePrint Arch.},
This chapter describes Peter L. Montgomery’s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units. We propose a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms. Peter L. Montgomery [55] 

Figures and Tables from this paper

Hardware Aspects of Montgomery Modular Multiplication

  • C. D. Walter
  • Mathematics, Computer Science
    IACR Cryptol. ePrint Arch.
  • 2017
This chapter compares Peter Montgomery's modular multiplication method with traditional techniques for suitability on hardware platforms. It also covers systolic array implementations and side

Combining Montgomery Multiplication with Tag Tracing for the Pollard's Rho Algorithm in Prime Order Fields

This paper shows how to apply Montgomery multiplication to the tag tracing variant of the Pollard's rho algorithm applied to prime order fields, which eliminates costly modular reductions and replaces these with much more efficient divisions by a suitable power of two.

Speeding up decimal multiplication

This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x-5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place matrix transposition.

N ov 2 02 0 Speeding up decimal multiplication

This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x—5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place 2n×n or n×2n matrix transposition.

Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings and Supersingular Isogeny-Based Protocols

  • P. Longa
  • Computer Science, Mathematics
    IACR Cryptol. ePrint Arch.
  • 2022
The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of “double-precision” operations, and can be easily adapted to the methods that exist to compute modular multiplication.

Montgomery-friendly primes and applications to cryptography

It is shown that, for dedicated architectures with word operators, larger R NS bases with Montgomery-friendly pairwise co-primes than the RNS bases generally used in the literature with Pseudo-Mersenne numbers can be reached.

Multiprecision ANSI C Library for Implementation of Cryptographic Algorithms on Microcontrollers

This work presents a novel ANSI C library that implements also some less common operations like, e.g., multiprecision integer division and has been tested on ARM M4-based microcontroller Microchip CEC1302.

EdMSM: Multi-Scalar-Multiplication for recursive SNARKs and more

An overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form for SNARK-friendly chains and cycles of elliptic curves.

Cryptographic Program Obfuscation: Practical Solutions and Application-Driven Models

This chapter provides a brief summary of the state of the art in cryptographic program obfuscation, focusing on two main aspects: first, there are many implementations of point function obfuscators, satisfying different obfuscation notions, and many of them can be used with practical performance guarantees; second, multiple application-driven obfuscation models and problems can be generated.

B-SIDH: supersingular isogeny Diffie-Hellman using twisted torsion

  • Craig Costello
  • Mathematics, Computer Science
    IACR Cryptol. ePrint Arch.
  • 2019
This framework lifts the restrictions on the shapes of the underlying prime fields originally imposed by Jao and De Feo, and allows a range of new options for instantiating isogeny-based public key cryptography, including alternatives that exploit Mersenne and Montgomeryfriendly primes.

Montgomery Multiplication Using Vector Instructions

A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.

Efficient SIMD Arithmetic Modulo a Mersenne Number

This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall

Systolic Modular Multiplication

  • C. D. Walter
  • Computer Science, Mathematics
    IEEE Trans. Computers
  • 1993
A systolic array for modular multiplication is presented using the ideally suited algorithm of P.L. Montgomery (1985), where its main use would be where many consecutive multiplications are done, as in RSA cryptosystems.

Montgomery Multiplication on the Cell

A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed, which consists of splitting a number into four consecutive parts, representing columns in a 4-SIMD organization.

Long Modular Multiplication for Cryptographic Applications

  • L. Hars
  • Computer Science, Mathematics
  • 2004
Several new “column-sum” variants of popular quadratic time modular multiplication algorithms are presented, which are faster than the traditional implementations, need no or very little memory beyond the operand storage and perform squaring about twice faster than general multiplications or modular reductions.

A Cryptographic Library for the Motorola DSP56000

A cryptographic library for the Motorola DSP56000 that provides hardware speed yet software flexibility, and an algorithm for modular multiplication that interleaves multiplication with Montgomery modular reduction to give a very fast implementation of RSA.

An RNS Montgomery modular multiplication algorithm

The authors present a new RNS modular multiplication for very large operands based on Montgomery's method adapted to mixed radix, and is performed using a residue number system.

Montgomery Exponentiation with no Final Subtractions: Improved Results

This paper proposes an improved (faster) version of the Montgomery multiplication and provides figures about the overhead of these versions relatively to a speed optimised version (theoretically and experimentally).

Modulo Reduction in Residue Number Systems

This paper shows a new combination of residue number systems with efficient modulo reduction methods, and two methods are compared, and the faster one is scrutinized in detail.

Parallel cryptographic arithmetic using a redundant Montgomery representation

  • D. PageN. Smart
  • Computer Science, Mathematics
    IEEE Transactions on Computers
  • 2004
It is shown that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code, especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels.