# String Indexing for Patterns with Wildcards

@article{Bille2013StringIF,
title={String Indexing for Patterns with Wildcards},
author={Philip Bille and Inge Li G{\o}rtz and Hjalte Wedel Vildh{\o}j and S{\o}ren Vind},
journal={Theory of Computing Systems},
year={2013},
volume={55},
pages={41-60}
}
• Published 24 October 2011
• Computer Science, Mathematics
• Theory of Computing Systems
We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. A linear space index with query time O(m+σjloglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, [2007]), which requires query time Θ(jn) in the worst case…
21 Citations
Gapped Indexing for Consecutive Occurrences
• Computer Science, Mathematics
CPM
• 2021
A variant of string indexing, where the goal is to compactly represent the string such that given two patterns P1 and P2 and a gap range the authors can quickly find the consecutive occurrences with distance in [α, β], is considered.
Space-Efficient String Indexing for Wildcard Pattern Matching
• Computer Science
STACS
• 2014
These are the first non-trivial data structures for this problem that need $o(n\log n)$ bits of space.
Data Structure Lower Bounds for Document Indexing Problems
• Computer Science
ICALP
• 2016
We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful
Algorithms and Data Structures for Strings, Points and Integers: or, Points about Strings and Strings about Points
This dissertation presents a O(n) space data structure that supports fingerprint queries, and is the first for general (unbalanced) SLPs that answers fingerprint queries without decompressing any text, and are the first to dynamically maintain a string under a compression scheme that can achieve better than entropy compression.
String Indexing for Top-k Close Consecutive Occurrences
• Computer Science, Mathematics
FSTTCS
• 2020
Two new time-space trade-offs are given for the string indexing for top-$k$ close consecutive occurrences problem (SITCCO), including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.
Error Tree: A Tree Structure for Hamming & Edit Distances & Wildcards Matching
Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, Hamming and edit distances, as well as the wildcards matching problem. The input is a
Detecting Pattern Efficiently with Don't Cares
• Computer Science
EANN
• 2020
This paper introduces an efficient simple method which can locate all occurrences of pattern P of k subpatterns with “don’t cares” of length m in text S of length n using a predefined computational method.
Frequency Based Indexing Technique for Pattern Matching 1852
The proposed indexing technique is an attempt to answer the queries based on the LIKE ‘%...%’ search without requiring full table scan which is shown through the empirical evaluation of the proposed scheme.
Matching and Compression of Strings with Automata and Word Packing
This paper considers subsequence automata with default transitions, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the currentCharacter, and presents a novel hierarchical automata construction of independent interest.
String Indexing for Patterns with Wildcards
• Computer Science, Mathematics
SWAT
• 2012
This work considers the problem of indexing a string t to report the occurrences of a query pattern p containing m characters and j wildcards, and obtains an index with query time O(m+j+occ) using space O(\sigma^{k^2} n \log^k\log n)$, where k is the maximum number of wildcards allowed in the pattern. ## References SHOWING 1-10 OF 52 REFERENCES A linear size index for approximate pattern matching • Computer Science J. Discrete Algorithms • 2011 The feasibility of devising a linear-size index that still has a time complexity linear in m is investigated and an O(n)-space index is given that supports k-error matching in O(m + occ + (logn)$^{k({\it k}+1)}$log logn) worst-case time. Space Efficient Indexes for String Matching with Don't Cares • Computer Science, Mathematics ISAAC • 2007 The solution to the pattern-only case improves the matching time of the previous work tremendously in practice, and can be extended to handle optional wildcards, each of which can match zero or one character. Dotted Suffix Trees A Structure for Approximate Text Indexing • Computer Science SPIRE • 2006 This work addresses text indexing for approximate matching, given a text which undergoes some preprocessing to generate an index, and can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). Text indexing with errors • Computer Science J. Discrete Algorithms • 2007 Indexing with Gaps This paper proposes a solution for k gaps one with preprocessing time O(nG2k logk n log log n) and space of O(m + 2k log Log n), where m = Σi=1 |pi|. Finding Patterns with Variable Length Gaps or Don't Cares • Computer Science COCOON • 2006 New algorithms to handle the pattern matching problem where the pattern can contain variable length gaps are presented and are shown to be useful in many other contexts. Fast Algorithms for Finding Nearest Common Ancestors • Computer Science, Mathematics SIAM J. Comput. • 1984 An algorithm for a random access machine with uniform cost measure (and a bound of$\Omega (\log n)\$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.
Pattern Matching Algorithms with Don't Cares
• Computer Science
SOFSEM
• 2007
This paper presents algorithms for pattern matching, where either the pattern P or the text T can contain “don’t care” characters, and can solve the pattern matching problem in O(n +m + α) time, where α is the total number of occurrences of the component subpatterns.
Efficient string matching with wildcards and length constraints
• Computer Science
Knowledge and Information Systems
• 2006
A complete algorithm, SAIL, is designed that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with a O(lm) space overhead.
Succinct Text Indexing with Wildcards
• Computer Science
SPIRE
• 2009
The first succinct index for a text that contains wildcards is presented, which doubles the size, yet it reduces the matching time to O (m log*** + m logd + occ ), where m is the length of the query text.