Learn More
This paper studies software optimization of elliptic-curve cryptography with $$256$$ 256 -bit prime fields. We propose a constant-time implementation of the NIST and SECG standardized curve P- $$256$$ 256 , that can be seamlessly integrated into OpenSSL. This accelerates Perfect Forward Secrecy TLS handshakes that use ECDSA and/or ECDHE, and can help in(More)
A new implementation of the GHASH function has been recently committed to a Git version of Open SSL, to speed up AES-GCM. We identified a bug in that implementation, and made sure it was quickly fixed before trickling into an official Open SSL trunk. Here, we use this (already fixed) bug as a real example that demonstrates the fragility of AES-GCM's(More)
This paper describes an algorithm for accelerating the computations of Davies–Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s(More)
We describe a method for efficiently hashing multiple messages of different lengths. Such computations occur in various scenarios, and one of them is when an operating system checks the integrity of its components during boot time. These tasks can gain performance by parallelizing the computations and using SIMD architectures. For such scenarios, we compare(More)
Counter mode is one of the standard modes of operation for block ciphers. It has performance advantages due to its high parallelism. For a given key and a 96-bit IV, a 128-bit ciphertext block is computed by XOR-ing the corresponding plaintext block with the encryption of a unique 128-bit Counter Block. The Counter Block values are generated by incrementing(More)
This paper deals with optimizations for big-numbers (multi-precision) squaring, and their efficient implementation on x86-64 platforms. Such optimizations have various usages, and a most prominent one is RSA acceleration, where big-numbers squaring consumes a significant portion of the computations. We introduce an algorithm for big-numbers squaring, that(More)
Intel has recently announced a new set of processor instructions, dubbed AVX512IFMA, that carry out Integer Fused Multiply Accumulate operations. These instructions operate on 512-bit registers and compute eight independent 52-bit unsigned integer multiplications, to generate eight 104-bit products, and accumulate their low/high halves into 64-bit(More)
  • 1