Bitwise data parallelism in regular expression matching

@article{Cameron2014BitwiseDP,
  title={Bitwise data parallelism in regular expression matching},
  author={Robert D. Cameron and Thomas C. Shermer and Arrvindh Shriraman and Kenneth S. Herdy and Dan Lin and Benjamin R. Hull and Meng Lin},
  journal={2014 23rd International Conference on Parallel Architecture and Compilation (PACT)},
  year={2014},
  pages={139-150}
}
  • R. CameronT. Shermer Meng Lin
  • Published 24 August 2014
  • Computer Science
  • 2014 23rd International Conference on Parallel Architecture and Compilation (PACT)
A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem. Building on the bitwise data parallelism previously applied to the manual implementation of token scanning in the Parabix XML parser, the new algorithm represents a general solution to the problem of regular expression matching using parallel bit streams. On widely-deployed commodity hardware using 128-bit SSE2 SIMD technology, our algorithm… 

Bitwise Data Parallelism with LLVM: The ICgrep Case Study

This paper examines the application of bitwise data parallelism using short vector SIMD instructions to the development of a full-featured Unicode-capable open-source grep implementation, constructed using a layered architecture combining Parabix and LLVM compiler technologies.

Systematic Support of Parallel Bit Streams in LLVM

Modifications to LLVM are investigated to incorporate all the SIMD processing requirements of Parabix both to increase the portability of applications and to create additional opportunities to optimize those operations in the context of code generation.

New Pattern Matching Approaches Comparison

This paper will specifically go over scaling, algorithms, performance measures, resources used, and the selection of architecture discussed in three different papers.

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

This dissertation investigates the further development of the ParabIX framework to incorporate multidimensional parallelization, combining Parabix methods with several different models of multithreading such as task parallelism, data parallelism and pipeline parallelism as well as with GPU-based SIMT processing.

s2k: A parallel language for streaming text extraction and transformations

This work defines s2k, a global-view parallel programming language for streaming text extraction and transformations that integrates stream programming abstractions and parallel bitstream programming methods.

Automata Processor Architecture and Applications: A Survey

A survey of the state-of-the-art in automata processor based hardware accelerators and describes AP hardware architecture, its programming environments, its current successful applications in a wide range of diverse fields and explores future research trends and opportunities.

Optimizing Regular Expressions via Rewrite-Guided Synthesis

This work investigates automatically transforming regular expressions to remove inefficiencies, and presents a new approach called rewrite-guided synthesis (ReGiS), in which a unique interplay between SyGuS and equality saturation-based rewriting helps to overcome these problems, resulting in an efficient, scalable framework for expression optimization.

REGISTOR

Registor achieves high throughput, reduces the I/O bandwidth requirement by up to 97%, and reduces CPU utilization by as much as 82% for regex search in large datasets.

JSONSki: streaming semi-structured data with bit-parallel fast-forwarding

This work designs a highly bit-parallel solution that intensively utilizes bitwise and SIMD operations to identify the irrelevant substructures during the streaming that can achieve significant speedups over the state-of-the-art JSON processing tools while taking a minimum memory footprint.

UDP: A Programmable Accelerator for Extract-Transform-Load Workloads and More

The design of the unstructured data processor (UDP), a software programmable accelerator that includes multi-way dispatch, variable-size symbol support, flexible-source dispatch, stream buffer and scalar registers, and memory addressing to accelerate ETL kernels both for current and novel future encoding and compression is proposed.

References

SHOWING 1-10 OF 29 REFERENCES

High-performance regular expression scanning on the Cell/B.E. processor

This work presents an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization, and presents a family of optimized kernels that implement the algorithm.

Parallel Scanning with Bitstream Addition: An XML Case Study

A parallel scanning method using the concept of bitstream addition is introduced and studied in application to the problem of XML parsing and well-formedness checking, yielding a dramatic speed-up over traditional alternatives employing byte-at-a-time parsing.

Faster Regular Expression Matching

This work shows how to solve regular expression matching in linear space and O (nm (loglogn)/(logn )3/2 + n + m ) time where m is the length of the expression and n the lengthof the string.

A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

A new algorithm for string matching called BNDM, which is the bit-parallel simulation of a known (but recent) algorithm called BDM, and which can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in thepattern or in thetext, combining simplicity, efficiency and flexibility.

Data-parallel finite-state machines

This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol, which allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores.

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

  • D. Scarpazza
  • Computer Science
    International Journal of Parallel Programming
  • 2010
This work presents a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput and shows the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor.

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

This work describes an optimization path of the kernel of flex to four nVidia GPGPU models, with decisions based on quantitative micro-benchmarking, performance counters and simulator runs, and achieves a tokenization throughput that exceeds the results obtained by the GPG PU-based string matching solutions presented so far, and compares well with solutions obtained on any architecture.

Parabix: Boosting the efficiency of text processing on commodity processors

This paper advocates and develops Parabix as a general framework and toolkit, describing the software toolchain and run-time support that allows applications to exploit modern SIMD instructions for high performance text processing and generalizing the techniques to ensure that they apply across a wide variety of applications and architectures.

Tools for Very Fast Regular Expression Matching

DotStar is a complete algorithmic solution and a software tool chain that can compile large sets of user-provided regex first into a sequence of intermediate representations and then into an automaton that can search for matches in a single pass without backtracking.

Fast String Search on Multicore Processors: Mapping fundamental algorithms onto parallel hardware

This article shows how it mapped string searching efficiently on the Cell, and presents two implementations: the fast implementation supports a small dictionary size and provides a throughput of 40 Gbps, which is 100 times faster than reference implementations on x86 architectures.