Bitwise data parallelism in regular expression matching

@article{Cameron2014BitwiseDP,
  title={Bitwise data parallelism in regular expression matching},
  author={Robert D. Cameron and Thomas C. Shermer and Arrvindh Shriraman and Kenneth S. Herdy and Dan Lin and Benjamin R. Hull and Meng Lin},
  journal={2014 23rd International Conference on Parallel Architecture and Compilation (PACT)},
  year={2014},
  pages={139-150},
  url={https://api.semanticscholar.org/CorpusID:2193114}
}
A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem and can substantially outperform traditional grep implementations based on NFAs, DFAs or backtracking.

Bitwise Data Parallelism with LLVM: The ICgrep Case Study

This paper examines the application of bitwise data parallelism using short vector SIMD instructions to the development of a full-featured Unicode-capable open-source grep implementation, constructed using a layered architecture combining Parabix and LLVM compiler technologies.

Systematic Support of Parallel Bit Streams in LLVM

Modifications to LLVM are investigated to incorporate all the SIMD processing requirements of Parabix both to increase the portability of applications and to create additional opportunities to optimize those operations in the context of code generation.

New Pattern Matching Approaches Comparison

This paper will specifically go over scaling, algorithms, performance measures, resources used, and the selection of architecture discussed in three different papers.

HARE: Hardware accelerator for regular expressions

This paper describes a 1GHz 32-character-wide HARE design targeting ASIC implementation that processes data at 32 GB/s - matching modern memory bandwidths and demonstrates a scaled-down FPGA proof-of-concept that operates at 100MHz with 4-wide parallelism (400 MB/s).

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

This dissertation investigates the further development of the ParabIX framework to incorporate multidimensional parallelization, combining Parabix methods with several different models of multithreading such as task parallelism, data parallelism and pipeline parallelism as well as with GPU-based SIMT processing.

s2k: A parallel language for streaming text extraction and transformations

This work defines s2k, a global-view parallel programming language for streaming text extraction and transformations that integrates stream programming abstractions and parallel bitstream programming methods.

Automata Processor Architecture and Applications: A Survey

A survey of the state-of-the-art in automata processor based hardware accelerators and describes AP hardware architecture, its programming environments, its current successful applications in a wide range of diverse fields and explores future research trends and opportunities.

Optimizing Regular Expressions via Rewrite-Guided Synthesis

This work investigates automatically transforming regular expressions to remove inefficiencies, and presents a new approach called rewrite-guided synthesis (ReGiS), in which a unique interplay between SyGuS and equality saturation-based rewriting helps to overcome these problems, resulting in an efficient, scalable framework for expression optimization.

Fast support for unstructured data processing: The unified automata processor

The Unified Automata Processor (UAP), a new architecture that provides general and efficient support for finite automata (FA), is proposed, a promising candidate for integration into general-purpose computing architectures.

REGISTOR

Registor achieves high throughput, reduces the I/O bandwidth requirement by up to 97%, and reduces CPU utilization by as much as 82% for regex search in large datasets.

High-performance regular expression scanning on the Cell/B.E. processor

This work presents an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization, and presents a family of optimized kernels that implement the algorithm.

Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs

Several techniques are introduced to do optimization on GPUs, including reducing global memory transactions of input buffer, reducing latency of transition table lookup, eliminating output table accesses, avoiding bank-conflict of shared memory, coalescing writes to global memory, and enhancing data transmission via peripheral component interconnect express.

Fast and flexible string matching by combining bit-parallelism and suffix automata

A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area.

Parallel Scanning with Bitstream Addition: An XML Case Study

A parallel scanning method using the concept of bitstream addition is introduced and studied in application to the problem of XML parsing and well-formedness checking, yielding a dramatic speed-up over traditional alternatives employing byte-at-a-time parsing.

Faster Regular Expression Matching

This work shows how to solve regular expression matching in linear space and O (nm (loglogn)/(logn )3/2 + n + m ) time where m is the length of the expression and n the lengthof the string.

A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

A new algorithm for string matching called BNDM, which is the bit-parallel simulation of a known (but recent) algorithm called BDM, and which can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in thepattern or in thetext, combining simplicity, efficiency and flexibility.

Data-parallel finite-state machines

This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol, which allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores.

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

This work presents a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput and shows the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor.

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

This work describes an optimization path of the kernel of flex to four nVidia GPGPU models, with decisions based on quantitative micro-benchmarking, performance counters and simulator runs, and achieves a tokenization throughput that exceeds the results obtained by the GPG PU-based string matching solutions presented so far, and compares well with solutions obtained on any architecture.

Parabix: Boosting the efficiency of text processing on commodity processors

This paper advocates and develops Parabix as a general framework and toolkit, describing the software toolchain and run-time support that allows applications to exploit modern SIMD instructions for high performance text processing and generalizing the techniques to ensure that they apply across a wide variety of applications and architectures.