Compiler auto-vectorization of matrix multiplication modulo small primes


Modern CPUs have vector instruction sets such as SSE2 and AVX2 which support the bit level operations (and, or, xor, etc. ) as well as floating point and integer arithmetic. Furthermore compilers, such as g++ and Clang, have auto-vectorization features to exploit the vector instructions. In this study we take advantage of these tools to improve performance… (More)
DOI: 10.1145/3115936.3115943


21 Figures and Tables