# Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

@inproceedings{Berglund2014AnalyzingCB, title={Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching}, author={Martin Berglund and Frank Drewes and Brink van der Merwe}, booktitle={AFL}, year={2014} }

We consider in some detail how regular expression matching happens in Java, as a popular representative of the category of regex-directed matching engines. We extract a slightly idealized algorithm ...

## 26 Citations

### Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA

- Computer ScienceCIAA
- 2016

We apply results from ambiguity of non-deterministic finite automata to the problem of determining the asymptotic worst-case matching time, as a function of the length of the input strings, when…

### On the semantics of regular expression parsing in the wild

- Computer ScienceTheor. Comput. Sci.
- 2015

### Semantics, analysis and security of backtracking regular expression matchers

- Computer Science
- 2015

This research develops a semantics view of regular expressions that formalizes the backtracking paradigm and discovers a novel static analysis capable of detecting exponential runtime vulnerabilities; an extremely undesired reality of backtracking regular expression matchers.

### Turning evil regexes harmless

- Computer ScienceSAICSIT '17
- 2017

The relationship between ambiguity in automata and regular expressions and the matching time of backtracking regular expression matchers is explored, and techniques to reduce or remove ambiguity from regular expressions are investigated.

### Sound Static Analysis of Regular Expressions for Vulnerabilities to Denial of Service Attacks

- Computer ScienceTASE
- 2022

A framework based on a tree semantics to statically identify ReDoS vulnerabilities is introduced and an algorithm to extract an overapproximation of the set of words that are dangerous for a regular expression is put forward, effectively catching all possible attacks.

### Static analysis of regular expressions

- Computer Science
- 2017

A method for accurately modeling the matching time behaviour of a backtracking regular expression matcher, by using automata theoretic methods, is presented and analyzed by using the concept of ambiguity in nondeterministic finite-state automata.

### Regular Expressions with Backreferences Re-examined

- Computer ScienceStringology
- 2017

The aim is to compare the various flavors of regular expression matching by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes.

### of the Workshop Workshop on Trends in Tree Automata and Tree Transducers

- Computer Science, Mathematics
- 2016

An algorithm for computing the N best roots of a weighted hypergraph is proposed, in which the weight function is given over an idempotent and multiplicatively monotone semiring, and it is proved that the proposed algorithm is correct.

### Solving string constraints with Regex-dependent functions through transducers with priorities and variables

- Computer ScienceProc. ACM Program. Lang.
- 2022

This paper introduces a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions and introduces a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images.

### Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS)

- Computer Science2021 IEEE Symposium on Security and Privacy (SP)
- 2021

This work presents techniques to provably eliminate super-linear regex behavior with low space costs for typical regexes, and proposes selective memoization schemes with varying space/time tradeoffs and an encoding scheme that leverages insights about regex engine semantics to reduce the space cost of memoization.

## References

SHOWING 1-10 OF 11 REFERENCES

### A Formal Study Of Practical Regular Expressions

- Computer ScienceInt. J. Found. Comput. Sci.
- 2003

It is shown that the languages represented by extended regex are incomparable with context-free languages and a proper subset of context-sensitive languages.

### Programming Techniques: Regular expression search algorithm

- Computer ScienceCACM
- 1968

A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular…

### Regular Expression Search Algorithm

- Computer Science
- 1968

A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular…

### Parsing expression grammars: a recognition-based syntactic foundation

- Computer SciencePOPL '04
- 2004

PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components, and are here proven equivalent in effective recognition power.

### Computers and Intractability: A Guide to the Theory of NP-Completeness

- Computer Science
- 1978

It is proved here that the number ofrules in any irredundant Horn knowledge base involving n propositional variables is at most n 0 1 times the minimum possible number of rules.

### General Algorithms for Testing the Ambiguity of Finite Automata

- Computer ScienceDevelopments in Language Theory
- 2008

Efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with i¾?-transitions and an application of these algorithms to an approximate computation of the entropy of a probabilistic automaton are presented.

### The Complexity of the Exponential Output Size Problem for Top-Down and Bottom-Up Tree Transducers

- Computer ScienceInf. Comput.
- 2001

The complexity of the exponential output size problem is studied and it is shown to be NL-complete for total top-down tree transducers, DEXPTIME- complete for general top- down tree transducer, and P-completefor bottom-up tree trans producers.

### A fast string searching algorithm

- Computer ScienceCACM
- 1977

The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected.