# String Sanitization Under Edit Distance: Improved and Generalized

@inproceedings{Mieno2021StringSU, title={String Sanitization Under Edit Distance: Improved and Generalized}, author={Takuya Mieno and Solon P. Pissis and Leen Stougie and Michelle Sweering}, booktitle={CPM}, year={2021} }

Let $W$ be a string of length $n$ over an alphabet $\Sigma$, $k$ be a positive integer, and $\mathcal{S}$ be a set of length-$k$ substrings of $W$. The ETFS problem asks us to construct a string $X_{\mathrm{ED}}$ such that: (i) no string of $\mathcal{S}$ occurs in $X_{\mathrm{ED}}$; (ii) the order of all other length-$k$ substrings over $\Sigma$ is the same in $W$ and in $X_{\mathrm{ED}}$; and (iii) $X_{\mathrm{ED}}$ has minimal edit distance to $W$. When $W$ represents an individual's data and…

## One Citation

### Matching Patterns with Variables Under Edit Distance

- Computer ScienceArXiv
- 2022

The problem of matching patterns with variables under edit distance is considered, but it is shown that the problem becomes intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.

## References

SHOWING 1-10 OF 41 REFERENCES

### A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

- Computer ScienceSIAM J. Comput.
- 2003

This work addresses the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights and presents an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity.

### All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings

- Computer Science, MathematicsSIAM J. Comput.
- 1998

This work builds a data structure that supports O(mn log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices, and presents a simple O(n2 log n) time and $\Theta(n^2)$ space algorithm to find all approximate tandem repeats xy within a string of size n.

### Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees

- Mathematics, Computer ScienceTheor. Comput. Sci.
- 1995

### A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

- Computer ScienceCPM
- 2018

This work extends the classic result of Masek and Paterson which computes the edit distance between two strings in O(m2/ logm) time to remove the dependence on ψ even when edits have arbitrary costs from a penalty matrix and shows a new algorithm for the fundamental problem of one-against-many banded alignment.

### Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping

- Computer Science2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

A framework for proving quadratic-time hardness of similarity measures is introduced, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability, and conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.

### Combinatorial Algorithms for String Sanitization

- Computer ScienceACM Trans. Knowl. Discov. Data
- 2021

A heuristic, MCSR-ALGO, is proposed, which replaces letters in the strings output by the algorithms with carefully selected letters, so that sensitive patterns are not reinstated, implausible patterns areNot introduced, and occurrences of spurious patterns are prevented.

### Approximate matching of regular expressions.

- Computer ScienceBulletin of mathematical biology
- 1989

### On the sorting-complexity of suffix tree construction

- Computer ScienceJACM
- 2000

A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.

### Hide and Mine in Strings: Hardness and Algorithms

- Computer Science2020 IEEE International Conference on Data Mining (ICDM)
- 2020

A study on the fundamental relation between data sanitization and frequent pattern mining, in the context of sequential data, and proposes integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.

### Constructing LZ78 tries and position heaps in linear time for large alphabets

- Computer ScienceInf. Process. Lett.
- 2015