String Sanitization Under Edit Distance: Improved and Generalized

  title={String Sanitization Under Edit Distance: Improved and Generalized},
  author={Takuya Mieno and Solon P. Pissis and Leen Stougie and Michelle Sweering},
Let $W$ be a string of length $n$ over an alphabet $\Sigma$, $k$ be a positive integer, and $\mathcal{S}$ be a set of length-$k$ substrings of $W$. The ETFS problem asks us to construct a string $X_{\mathrm{ED}}$ such that: (i) no string of $\mathcal{S}$ occurs in $X_{\mathrm{ED}}$; (ii) the order of all other length-$k$ substrings over $\Sigma$ is the same in $W$ and in $X_{\mathrm{ED}}$; and (iii) $X_{\mathrm{ED}}$ has minimal edit distance to $W$. When $W$ represents an individual's data and… 

Figures from this paper

Matching Patterns with Variables Under Edit Distance

The problem of matching patterns with variables under edit distance is considered, but it is shown that the problem becomes intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.



String Sanitization Under Edit Distance

An algorithm to solve ETFS in (kn²) time, which improves on the state of the art by a factor of |Σ| and shows that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false.

All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings

This work builds a data structure that supports O(mn log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices, and presents a simple O(n2 log n) time and $\Theta(n^2)$ space algorithm to find all approximate tandem repeats xy within a string of size n.

Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees

A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

This work extends the classic result of Masek and Paterson which computes the edit distance between two strings in O(m2/ logm) time to remove the dependence on ψ even when edits have arbitrary costs from a penalty matrix and shows a new algorithm for the fundamental problem of one-against-many banded alignment.

Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping

A framework for proving quadratic-time hardness of similarity measures is introduced, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability, and conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.

Combinatorial Algorithms for String Sanitization

A heuristic, MCSR-ALGO, is proposed, which replaces letters in the strings output by the algorithms with carefully selected letters, so that sensitive patterns are not reinstated, implausible patterns areNot introduced, and occurrences of spurious patterns are prevented.

A Faster Algorithm Computing String Edit Distances

Approximate matching of regular expressions.

On the sorting-complexity of suffix tree construction

A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.

A Linear-Time Algorithm for Seeds Computation

A linear-time algorithm computing a linear-size representation of all seeds of a word that can easily derive the shortest seed and the number of seeds from the authors' representation and improves upon a previous O(n log n)-time algorithm.