# String Sanitization Under Edit Distance: Improved and Generalized

@inproceedings{Mieno2021StringSU, title={String Sanitization Under Edit Distance: Improved and Generalized}, author={Takuya Mieno and Solon P. Pissis and Leen Stougie and Michelle Sweering}, booktitle={CPM}, year={2021} }

Let $W$ be a string of length $n$ over an alphabet $\Sigma$, $k$ be a positive integer, and $\mathcal{S}$ be a set of length-$k$ substrings of $W$. The ETFS problem asks us to construct a string $X_{\mathrm{ED}}$ such that: (i) no string of $\mathcal{S}$ occurs in $X_{\mathrm{ED}}$; (ii) the order of all other length-$k$ substrings over $\Sigma$ is the same in $W$ and in $X_{\mathrm{ED}}$; and (iii) $X_{\mathrm{ED}}$ has minimal edit distance to $W$. When $W$ represents an individual's data and…

## One Citation

### Matching Patterns with Variables Under Edit Distance

- Computer ScienceSPIRE
- 2022

The problem of matching patterns with variables under edit distance is considered, but it is shown that the problem becomes intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.

## References

SHOWING 1-10 OF 41 REFERENCES

### String Sanitization Under Edit Distance

- Computer ScienceCPM
- 2020

An algorithm to solve ETFS in (kn²) time, which improves on the state of the art by a factor of |Σ| and shows that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false.

### All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings

- Computer Science, MathematicsSIAM J. Comput.
- 1998

This work builds a data structure that supports O(mn log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices, and presents a simple O(n2 log n) time and $\Theta(n^2)$ space algorithm to find all approximate tandem repeats xy within a string of size n.

### Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees

- Mathematics, Computer ScienceTheor. Comput. Sci.
- 1995

### A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

- Computer ScienceCPM
- 2018

This work extends the classic result of Masek and Paterson which computes the edit distance between two strings in O(m2/ logm) time to remove the dependence on ψ even when edits have arbitrary costs from a penalty matrix and shows a new algorithm for the fundamental problem of one-against-many banded alignment.

### Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping

- Computer Science2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

A framework for proving quadratic-time hardness of similarity measures is introduced, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability, and conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.

### Combinatorial Algorithms for String Sanitization

- Computer ScienceACM Trans. Knowl. Discov. Data
- 2021

A heuristic, MCSR-ALGO, is proposed, which replaces letters in the strings output by the algorithms with carefully selected letters, so that sensitive patterns are not reinstated, implausible patterns areNot introduced, and occurrences of spurious patterns are prevented.

### Approximate matching of regular expressions.

- Computer ScienceBulletin of mathematical biology
- 1989

### On the sorting-complexity of suffix tree construction

- Computer ScienceJACM
- 2000

A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.

### A Linear-Time Algorithm for Seeds Computation

- Computer ScienceSODA
- 2012

A linear-time algorithm computing a linear-size representation of all seeds of a word that can easily derive the shortest seed and the number of seeds from the authors' representation and improves upon a previous O(n log n)-time algorithm.