Efficient software implementations of modular exponentiation

@article{Gueron2012EfficientSI,
  title={Efficient software implementations of modular exponentiation},
  author={Shay Gueron},
  journal={Journal of Cryptographic Engineering},
  year={2012},
  volume={2},
  pages={31-43}
}
  • S. Gueron
  • Published 5 April 2012
  • Computer Science
  • Journal of Cryptographic Engineering
The significant cost of RSA computations affects the efficiency and responsiveness of SSL/TLS servers, and therefore software implementations of RSA are an important target for optimization. To this end, we study here efficient software implementations of modular exponentiation, which are also protected against software side channel analyses. We target superior performance for the ubiquitous ×86_64 architectures, used in most server platforms. The paper proposes optimizations in several… 
Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures
TLDR
It is demonstrated, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture.
Speeding Up Big-Numbers Squaring
  • S. GueronV. Krasnov
  • Computer Science
    2012 Ninth International Conference on Information Technology - New Generations
  • 2012
TLDR
An algorithm for big-numbers squaring, that reduces the number of single precision add-with-carry operations, and trades several additions with a single left shift operation, and is used in a recently posted Open SSL patch for accelerating modular exponentiation for RSA.
Pushing the Performance Envelope of Modular Exponentiation Across Multiple Generations of GPUs
  • Niall EmmartC. Weems
  • Computer Science
    2015 IEEE International Parallel and Distributed Processing Symposium
  • 2015
TLDR
This paper shows how to improve modular exponentiation performance over prior results by at factors ranging from 2.6 to 24, across generations of NVIDIA GPU, from compute capability 1.1 onward.
Parallel modular multiplication using 512-bit advanced vector instructions
TLDR
A new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families, and allows for squaring and sub-quadratic Karatsuba enhancements.
Fast modular squaring with AVX512IFMA
TLDR
This paper studies methods for using Intel’s forthcoming AVX512IFMA instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation.
Simple High-Level Code for Cryptographic Arithmetic - With Proofs, Without Compromises
TLDR
It is demonstrated that simple partial evaluation is sufficient to transform into the fastest-known C code, breaking the decades-old pattern that the only fast implementations are those whose instruction-level steps were written out by hand.
Fast prime field elliptic-curve cryptography with 256-bit primes
TLDR
A constant-time implementation of the NIST and SECG standardized curve P-$$256$$256-bit prime fields that can be seamlessly integrated into OpenSSL is proposed, that accelerates Perfect Forward Secrecy TLS handshakes that use ECDSA and/or ECDHE, and can help in improving the efficiency of TLS servers.
Paillier-encrypted databases with fast aggregated queries
  • Nir DruckerS. Gueron
  • Computer Science
    2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC)
  • 2017
TLDR
This paper shows a simple performance optimization for Paillier encryption that significantly reduces the server side workload and can be deployed by the server unilaterally, while remaining transparent to the client.
Start Your ENGINEs: Dynamically Loadable Contemporary Crypto
TLDR
The results confirm that the ENGINE API offers an ideal architecture to address wide-ranging security concerns, and is a valuable tool to enhance future research by easing testing and facilitating the dissemination of novel results in real-world systems.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Speeding Up Big-Numbers Squaring
  • S. GueronV. Krasnov
  • Computer Science
    2012 Ninth International Conference on Information Technology - New Generations
  • 2012
TLDR
An algorithm for big-numbers squaring, that reduces the number of single precision add-with-carry operations, and trades several additions with a single left shift operation, and is used in a recently posted Open SSL patch for accelerating modular exponentiation for RSA.
A Vulnerability in RSA Implementations Due to Instruction Cache Analysis and Its Demonstration on OpenSSL
TLDR
It is shown that one can completely break RSA in the original unpatched OpenSSL version (v.0.9.8e) even if the most secure configuration is in place, including all countermeasures against side-channel and MicroArchitectural analysis (in particular, base blinding).
Fast and Constant-Time Implementation of Modular Exponentiation
TLDR
This work presents a novel constant run-time approach that results in the world’s fastest modular exponentiation implementation on IA processors, bringing a 1.6X speedup to the fastest known modular exponentiated implementation in OpenSSL.
CACHE MISSING FOR FUN AND PROFIT
TLDR
It is demonstrated that this shared access to memory caches provides not only an easily used high bandwidth covert channel between threads, but also permits a malicious thread to monitor the execution of another thread, allowing in many cases for theft of cryptographic keys.
Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli
An optimal upper bound for the number of iterations and precise bounds for the output are established for the version of Montgomery Modular Multiplication from which conditional statements have been
Incomplete reduction in modular arithmetic
TLDR
The authors describe a novel method for obtaining fast software implementations of the arithmetic operations in the finite field GF(p) with an arbitrary prime modulus p of arbitrary length that avoids bit-level operations which are slow on microprocessors and performs word-level Operations which are significantly faster.
Analyzing and comparing Montgomery multiplication algorithms
TLDR
The operations involved in computing the Montgomery product are studied, several high-speed, space-efficient algorithms for computing MonPro(a, b), and their time and space requirements are described.
New Branch Prediction Vulnerabilities in OpenSSL and Necessary Software Countermeasures
TLDR
This paper presents a new and yet unforeseen side channel attack that is enabled by the recently published Simple Branch Prediction Analysis (SBPA), and shows that modular inversion is a natural target of SBPA attacks because it typically uses the Binary Extended Euclidean algorithm whose nature is an input-centric sequence of conditional branches.
Modular multiplication without trial division
TLDR
A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.
Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths
TLDR
This Recommendation (SP 800-131A) provides more specific guidance for transitions to the use of stronger cryptographic keys and more robust algorithms.
...
...