# Montgomery Arithmetic from a Software Perspective

@article{Bos2017MontgomeryAF, title={Montgomery Arithmetic from a Software Perspective}, author={Joppe W. Bos and Peter L. Montgomery}, journal={IACR Cryptol. ePrint Arch.}, year={2017}, volume={2017}, pages={1057} }

This chapter describes Peter L. Montgomery’s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units. We propose a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms. Peter L. Montgomery [55]

## 9 Citations

Hardware Aspects of Montgomery Modular Multiplication

- Mathematics, Computer ScienceIACR Cryptol. ePrint Arch.
- 2017

This chapter compares Peter Montgomery's modular multiplication method with traditional techniques for suitability on hardware platforms. It also covers systolic array implementations and side…

Combining Montgomery Multiplication with Tag Tracing for the Pollard's Rho Algorithm in Prime Order Fields

- Mathematics, Computer ScienceIACR Cryptol. ePrint Arch.
- 2021

This paper shows how to apply Montgomery multiplication to the tag tracing variant of the Pollard's rho algorithm applied to prime order fields, which eliminates costly modular reductions and replaces these with much more efficient divisions by a suitable power of two.

N ov 2 02 0 Speeding up decimal multiplication

- Computer Science, Mathematics
- 2020

This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x—5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place 2n×n or n×2n matrix transposition.

Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings and Supersingular Isogeny-Based Protocols

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2022

The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of “double-precision” operations, and can be easily adapted to the methods that exist to compute modular multiplication.

Montgomery-friendly primes and applications to cryptography

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2020

It is shown that, for dedicated architectures with word operators, larger R NS bases with Montgomery-friendly pairwise co-primes than the RNS bases generally used in the literature with Pseudo-Mersenne numbers can be reached.

Multiprecision ANSI C Library for Implementation of Cryptographic Algorithms on Microcontrollers

- Computer Science, Mathematics2019 8th Mediterranean Conference on Embedded Computing (MECO)
- 2019

This work presents a novel ANSI C library that implements also some less common operations like, e.g., multiprecision integer division and has been tested on ARM M4-based microcontroller Microchip CEC1302.

Cryptographic Program Obfuscation: Practical Solutions and Application-Driven Models

- Computer Science, Mathematics
- 2018

This chapter provides a brief summary of the state of the art in cryptographic program obfuscation, focusing on two main aspects: first, there are many implementations of point function obfuscators, satisfying different obfuscation notions, and many of them can be used with practical performance guarantees; second, multiple application-driven obfuscation models and problems can be generated.

B-SIDH: supersingular isogeny Diffie-Hellman using twisted torsion

- Mathematics, Computer ScienceIACR Cryptol. ePrint Arch.
- 2019

This framework lifts the restrictions on the shapes of the underlying prime fields originally imposed by Jao and De Feo, and allows a range of new options for instantiating isogeny-based public key cryptography, including alternatives that exploit Mersenne and Montgomeryfriendly primes.

Speeding up decimal multiplication

- Computer Science, MathematicsArXiv
- 2020

This paper focuses on the number-theoretic transform (NTT) family of algorithms and achieves a 3x-5x speedup over the mpdecimal library, and presents a simple cache-efficient algorithm for in-place matrix transposition.

## References

SHOWING 1-10 OF 79 REFERENCES

Montgomery Multiplication Using Vector Instructions

- Computer ScienceSelected Areas in Cryptography
- 2013

A parallel approach to compute interleaved Montgomery multiplication which is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions is presented.

Efficient SIMD Arithmetic Modulo a Mersenne Number

- Computer Science, Mathematics2011 IEEE 20th Symposium on Computer Arithmetic
- 2011

This paper describes carry-less arithmetic operations modulo an integer 2^M-1 in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall…

Systolic Modular Multiplication

- Computer Science, MathematicsIEEE Trans. Computers
- 1993

A systolic array for modular multiplication is presented using the ideally suited algorithm of P.L. Montgomery (1985), where its main use would be where many consecutive multiplications are done, as in RSA cryptosystems.

Montgomery Multiplication on the Cell

- Computer Science, MathematicsPPAM
- 2009

A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed, which consists of splitting a number into four consecutive parts, representing columns in a 4-SIMD organization.

Long Modular Multiplication for Cryptographic Applications

- Computer Science, MathematicsCHES
- 2004

Several new “column-sum” variants of popular quadratic time modular multiplication algorithms are presented, which are faster than the traditional implementations, need no or very little memory beyond the operand storage and perform squaring about twice faster than general multiplications or modular reductions.

A Cryptographic Library for the Motorola DSP56000

- Computer Science, MathematicsEUROCRYPT
- 1990

A cryptographic library for the Motorola DSP56000 that provides hardware speed yet software flexibility, and an algorithm for modular multiplication that interleaves multiplication with Montgomery modular reduction to give a very fast implementation of RSA.

Montgomery Exponentiation with no Final Subtractions: Improved Results

- Computer Science, MathematicsCHES
- 2000

This paper proposes an improved (faster) version of the Montgomery multiplication and provides figures about the overhead of these versions relatively to a speed optimised version (theoretically and experimentally).

Modulo Reduction in Residue Number Systems

- Computer Science, MathematicsIEEE Trans. Parallel Distributed Syst.
- 1995

This paper shows a new combination of residue number systems with efficient modulo reduction methods, and two methods are compared, and the faster one is scrutinized in detail.

Parallel cryptographic arithmetic using a redundant Montgomery representation

- Computer Science, MathematicsIEEE Transactions on Computers
- 2004

It is shown that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code, especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels.

Montgomery Modular Multiplication on ARM-NEON Revisited

- Computer Science, MathematicsICISC
- 2014

The Cascade Operand Scanning (COS) method is introduced to speed up multi-precision multiplication on SIMD architectures and it is shown that two COS computations can be “coarsely” integrated into an efficient vectorized variant of Montgomery modular multiplication, which the paper calls CICOS method.