# Efficient software implementations of modular exponentiation

@article{Gueron2012EfficientSI, title={Efficient software implementations of modular exponentiation}, author={Shay Gueron}, journal={Journal of Cryptographic Engineering}, year={2012}, volume={2}, pages={31-43} }

The significant cost of RSA computations affects the efficiency and responsiveness of SSL/TLS servers, and therefore software implementations of RSA are an important target for optimization. To this end, we study here efficient software implementations of modular exponentiation, which are also protected against software side channel analyses. We target superior performance for the ubiquitous ×86_64 architectures, used in most server platforms. The paper proposes optimizations in several…

## 30 Citations

Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures

- Computer ScienceWAIFI
- 2012

It is demonstrated, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture.

Speeding Up Big-Numbers Squaring

- Computer Science2012 Ninth International Conference on Information Technology - New Generations
- 2012

An algorithm for big-numbers squaring, that reduces the number of single precision add-with-carry operations, and trades several additions with a single left shift operation, and is used in a recently posted Open SSL patch for accelerating modular exponentiation for RSA.

A million-bit multiplier architecture for fully homomorphic encryption

- Computer Science, MathematicsMicroprocess. Microsystems
- 2014

Pushing the Performance Envelope of Modular Exponentiation Across Multiple Generations of GPUs

- Computer Science2015 IEEE International Parallel and Distributed Processing Symposium
- 2015

This paper shows how to improve modular exponentiation performance over prior results by at factors ranging from 2.6 to 24, across generations of NVIDIA GPU, from compute capability 1.1 onward.

Parallel modular multiplication using 512-bit advanced vector instructions

- Computer Science, MathematicsJ. Cryptogr. Eng.
- 2022

A new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families, and allows for squaring and sub-quadratic Karatsuba enhancements.

Fast modular squaring with AVX512IFMA

- Computer Science, MathematicsIACR Cryptol. ePrint Arch.
- 2018

This paper studies methods for using Intel’s forthcoming AVX512IFMA instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation.

Simple High-Level Code for Cryptographic Arithmetic - With Proofs, Without Compromises

- Computer Science2019 IEEE Symposium on Security and Privacy (SP)
- 2019

It is demonstrated that simple partial evaluation is sufficient to transform into the fastest-known C code, breaking the decades-old pattern that the only fast implementations are those whose instruction-level steps were written out by hand.

Fast prime field elliptic-curve cryptography with 256-bit primes

- Computer Science, MathematicsJournal of Cryptographic Engineering
- 2014

A constant-time implementation of the NIST and SECG standardized curve P-$$256$$256-bit prime fields that can be seamlessly integrated into OpenSSL is proposed, that accelerates Perfect Forward Secrecy TLS handshakes that use ECDSA and/or ECDHE, and can help in improving the efficiency of TLS servers.

Paillier-encrypted databases with fast aggregated queries

- Computer Science2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC)
- 2017

This paper shows a simple performance optimization for Paillier encryption that significantly reduces the server side workload and can be deployed by the server unilaterally, while remaining transparent to the client.

Start Your ENGINEs: Dynamically Loadable Contemporary Crypto

- Computer Science2019 IEEE Cybersecurity Development (SecDev)
- 2019

The results confirm that the ENGINE API offers an ideal architecture to address wide-ranging security concerns, and is a valuable tool to enhance future research by easing testing and facilitating the dissemination of novel results in real-world systems.

## References

SHOWING 1-10 OF 32 REFERENCES

Speeding Up Big-Numbers Squaring

- Computer Science2012 Ninth International Conference on Information Technology - New Generations
- 2012

An algorithm for big-numbers squaring, that reduces the number of single precision add-with-carry operations, and trades several additions with a single left shift operation, and is used in a recently posted Open SSL patch for accelerating modular exponentiation for RSA.

A Vulnerability in RSA Implementations Due to Instruction Cache Analysis and Its Demonstration on OpenSSL

- Computer Science, MathematicsCT-RSA
- 2008

It is shown that one can completely break RSA in the original unpatched OpenSSL version (v.0.9.8e) even if the most secure configuration is in place, including all countermeasures against side-channel and MicroArchitectural analysis (in particular, base blinding).

Fast and Constant-Time Implementation of Modular Exponentiation

- Computer Science, Mathematics
- 2009

This work presents a novel constant run-time approach that results in the world’s fastest modular exponentiation implementation on IA processors, bringing a 1.6X speedup to the fastest known modular exponentiated implementation in OpenSSL.

CACHE MISSING FOR FUN AND PROFIT

- Computer Science
- 2005

It is demonstrated that this shared access to memory caches provides not only an easily used high bandwidth covert channel between threads, but also permits a malicious thread to monitor the execution of another thread, allowing in many cases for theft of cryptographic keys.

Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli

- Computer Science, MathematicsCT-RSA
- 2002

An optimal upper bound for the number of iterations and precise bounds for the output are established for the version of Montgomery Modular Multiplication from which conditional statements have been…

Incomplete reduction in modular arithmetic

- Computer Science, Mathematics
- 2002

The authors describe a novel method for obtaining fast software implementations of the arithmetic operations in the finite field GF(p) with an arbitrary prime modulus p of arbitrary length that avoids bit-level operations which are slow on microprocessors and performs word-level Operations which are significantly faster.

Analyzing and comparing Montgomery multiplication algorithms

- Computer Science, MathematicsIEEE Micro
- 1996

The operations involved in computing the Montgomery product are studied, several high-speed, space-efficient algorithms for computing MonPro(a, b), and their time and space requirements are described.

New Branch Prediction Vulnerabilities in OpenSSL and Necessary Software Countermeasures

- Computer Science, MathematicsIMACC
- 2007

This paper presents a new and yet unforeseen side channel attack that is enabled by the recently published Simple Branch Prediction Analysis (SBPA), and shows that modular inversion is a natural target of SBPA attacks because it typically uses the Binary Extended Euclidean algorithm whose nature is an input-centric sequence of conditional branches.

Modular multiplication without trial division

- Mathematics, Computer Science
- 1985

A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.

Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths

- Computer Science
- 2011

This Recommendation (SP 800-131A) provides more specific guidance for transitions to the use of stronger cryptographic keys and more robust algorithms.