Efficient Matching of Regular Expressions with Lookaround Assertions
Regular expressions have been extended with lookaround assertions, which are subdivided into lookahead and lookbehind assertions. These constructs are used to refine when a match for a pattern occurs in the input text based on the surrounding context. Current implementation techniques for lookaround involve backtracking search, which can give rise to running time that is super-linear in the length of input text.
In this paper, we first consider a formal mathematical semantics for lookaround, which complements the commonly used operational understanding of lookaround in terms of a backtracking implementation. Our formal semantics allows us to establish several equational properties for simplifying lookaround assertions. Additionally, we propose a new algorithm for matching regular expressions with lookaround that has time complexity $O(m \cdot n)$, where $m$ is the size of the regular expression and $n$ is the length of the input text. The algorithm works by evaluating lookaround assertions in a bottom-up manner. Our algorithm makes use of a new notion of nondeterministic finite automata (NFAs) that are augmented with epsilon-transitions that are guarded by oracle queries that provide the truth values of lookaround assertions at every position in the text. We provide an implementation of our algorithm that incorporates three performance optimizations for reducing the work performed and memory used. We present an experimental comparison against PCRE, a state-of-the-art regex engine that supports lookaround assertions. Our experimental results show that, in contrast to PCRE, our implementation does not suffer from super-linear running time and is several times faster than PCRE.
Wed 17 JanDisplayed time zone: London change
15:10 - 16:30 | Automata And ComplexityPOPL at Kelvin Lecture Chair(s): Fritz Henglein Department of Computer Science, University of Copenhagen (DIKU) and Deon Digital | ||
15:10 20mTalk | Parikh's Theorem Made Symbolic POPL Matthew Hague Royal Holloway University of London, Artur Jez University of Wroclaw, Anthony Widjaja Lin TU Kaiserslautern; MPI-SWS | ||
15:30 20mTalk | Efficient Matching of Regular Expressions with Lookaround Assertions POPL | ||
15:50 20mTalk | The Complex(ity) Landscape of Checking Infinite Descent POPL Liron Cohen Ben-Gurion University of the Negev, Adham Jabarin Ben Gurion University, Andrei Popescu University of Sheffield, Reuben N. S. Rowe Royal Holloway University of London | ||
16:10 20mTalk | Positive Almost-Sure Termination – Complexity and Proof Rules POPL |