Transcoding billions of Unicode characters per second with SIMD instructions

  title={Transcoding billions of Unicode characters per second with SIMD instructions},
  author={Daniel Lemire and Wojciech Mula},
  journal={Software: Practice and Experience},
  pages={555 - 575}
In software, text is often represented using Unicode formats (UTF‐8 and UTF‐16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state‐of‐the‐art disks and networks. These transcoding functions make little use of the single‐instruction‐multiple‐data (SIMD) instructions available on commodity processors. By designing transcoding algorithms for SIMD instructions, we multiply the speed of transcoding on… 
1 Citations

Efficient multivariate low-degree tests via interactive oracle proofs of proximity for polynomial codes

The first interactive oracle proofs of proximity (IOPP) for tensor products of Reed-Solomon codes and for Reed-Muller codes (evaluation of polynomials with bounds on individual degrees) are presented and simulta-neously achieve logarithmic query complexity, logarithsmic verification time, linear oracle proof length and linear prover running time.



Faster Base64 Encoding and Decoding Using AVX2 Instructions

Compared to state-of-the-art implementations, this work multiplies the speeds of both the encoding and the decoding of base64 encoding and decoding by using the single-instruction-multiple-data instructions available on recent Intel processors (AVX2).

SIMD-based decoding of posting lists

This paper starts by exploring variable-length integer encoding formats used to represent postings, and defines a taxonomy that classifies encodings along three dimensions, representing the way in which data bits are stored and additional bits are used to describe the data.

A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding

High performance SIMD text processing using the method of parallel bit streams using the way of intraregister and intrachip parallelism on multicore processors is introduced with a case study of UTF-8 to UTF-16 transcoding.

Validating UTF‐8 in less than one instruction per byte

The lookupalgorithm is presented, which outperforms UTF‐8 validation routines used in many libraries and languages by more than 10 times using commonly available single‐instruction‐multiple‐data instructions.

Faster Population Counts Using AVX2 Instructions

A vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors, and has been adopted by LLVM and is used by its popular C compiler (Clang).

Stream VByte: Faster byte-oriented integer compression

UTF-8, a transformation format of ISO 10646

This memo updates and replaces RFC 2044, in particular addressing the question of versions of the relevant standards, and has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US- ASCII values but are transparent to other values.

Vectorization for SIMD architectures with alignment constraints

This paper presents a compilation scheme that systematically vectorizes loops in the presence of misaligned memory references, and proposes several techniques to minimize the number of data reorganization operations generated.

Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms

This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD Instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library.