JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures
We present a parallel implementation of the widely-used entropy encoding algorithm, the Huffman coder, on the NVIDIA CUDA architecture. After constructing the Huffman codeword tree serially, we proceed in parallel by generating a byte stream where each byte represents a single bit of the compressed output stream. The final step is then to combine each consecutive 8 bytes into a single byte in parallel to generate the final compressed output bit stream. Experimental results show that we can achieve up to 22× speedups compared to the serial CPU implementation without any constraint on the maximum codeword length or data entropy.