Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

  title={Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching},
  author={Martin Berglund and Frank Drewes and Brink van der Merwe},
We consider in some detail how regular expression matching happens in Java, as a popular representative of the category of regex-directed matching engines. We extract a slightly idealized algorithm ... 

Figures from this paper

Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA

We apply results from ambiguity of non-deterministic finite automata to the problem of determining the asymptotic worst-case matching time, as a function of the length of the input strings, when

On the semantics of regular expression parsing in the wild

Semantics, analysis and security of backtracking regular expression matchers

This research develops a semantics view of regular expressions that formalizes the backtracking paradigm and discovers a novel static analysis capable of detecting exponential runtime vulnerabilities; an extremely undesired reality of backtracking regular expression matchers.

Turning evil regexes harmless

The relationship between ambiguity in automata and regular expressions and the matching time of backtracking regular expression matchers is explored, and techniques to reduce or remove ambiguity from regular expressions are investigated.

Sound Static Analysis of Regular Expressions for Vulnerabilities to Denial of Service Attacks

A framework based on a tree semantics to statically identify ReDoS vulnerabilities is introduced and an algorithm to extract an overapproximation of the set of words that are dangerous for a regular expression is put forward, effectively catching all possible attacks.

Static analysis of regular expressions

A method for accurately modeling the matching time behaviour of a backtracking regular expression matcher, by using automata theoretic methods, is presented and analyzed by using the concept of ambiguity in nondeterministic finite-state automata.

Regular Expressions with Backreferences Re-examined

The aim is to compare the various flavors of regular expression matching by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes.

of the Workshop Workshop on Trends in Tree Automata and Tree Transducers

An algorithm for computing the N best roots of a weighted hypergraph is proposed, in which the weight function is given over an idempotent and multiplicatively monotone semiring, and it is proved that the proposed algorithm is correct.

Solving string constraints with Regex-dependent functions through transducers with priorities and variables

This paper introduces a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions and introduces a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images.

Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS)

This work presents techniques to provably eliminate super-linear regex behavior with low space costs for typical regexes, and proposes selective memoization schemes with varying space/time tradeoffs and an encoding scheme that leverages insights about regex engine semantics to reduce the space cost of memoization.



From regexes to parsing expression grammars

A Formal Study Of Practical Regular Expressions

It is shown that the languages represented by extended regex are incomparable with context-free languages and a proper subset of context-sensitive languages.

Programming Techniques: Regular expression search algorithm

A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular

Regular Expression Search Algorithm

A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular

Parsing expression grammars: a recognition-based syntactic foundation

  • B. Ford
  • Computer Science
    POPL '04
  • 2004
PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components, and are here proven equivalent in effective recognition power.

Computers and Intractability: A Guide to the Theory of NP-Completeness

It is proved here that the number ofrules in any irredundant Horn knowledge base involving n propositional variables is at most n 0 1 times the minimum possible number of rules.

General Algorithms for Testing the Ambiguity of Finite Automata

Efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with i¾?-transitions and an application of these algorithms to an approximate computation of the entropy of a probabilistic automaton are presented.

The Complexity of the Exponential Output Size Problem for Top-Down and Bottom-Up Tree Transducers

The complexity of the exponential output size problem is studied and it is shown to be NL-complete for total top-down tree transducers, DEXPTIME- complete for general top- down tree transducer, and P-completefor bottom-up tree trans producers.

A fast string searching algorithm

The algorithm has the unusual property that, in most cases, not all of the first <italic>i</italic) characters of a character string, “<italic>.” in another string, are inspected.