Faster Population Counts Using AVX2 Instructions

@article{Mula2018FasterPC,
  title={Faster Population Counts Using AVX2 Instructions},
  author={Wojciech Mula and Nathan Kurz and D. Lemire},
  journal={Comput. J.},
  year={2018},
  volume={61},
  pages={111-120}
}
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as… Expand
Counting bits in parallel
Roaring bitmaps: Implementation of an optimized software library
Population Count on Intel® CPU, GPU and FPGA
  • Zheming Jin, H. Finkel
  • Computer Science
  • 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • 2020
Vectorized Character Counting for Faster Pattern Matching
  • R. Snytsar
  • Computer Science, Mathematics
  • BIOINFORMATICS
  • 2019
Accelerating FM-index Search for Genomic Data Processing
Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity
...
1
2
3
4
...

References

SHOWING 1-10 OF 52 REFERENCES
Fast Quicksort Implementation Using AVX Instructions
SIMD compression and the intersection of sorted integers
Revisiting POPCOUNT Operations in CPUs / GPUs
Comparing fast implementations of bit permutation instructions
  • Y. Hilewitz, Z.J. Shi, R.B. Lee
  • Computer Science
  • Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004.
  • 2004
Branch prediction and the performance of interpreters — Don't trust folklore
A hybrid implementation of Hamming weight
  • E. Morancho
  • Computer Science
  • 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
  • 2014
The never ending problem of counting bits efficiently
Consistently faster and smaller compressed bitmaps with Roaring
Better bitmap performance with Roaring bitmaps
A technique for counting ones in a binary computer
...
1
2
3
4
5
...