# String Sanitization Under Edit Distance: Improved and Generalized

@inproceedings{Mieno2021StringSU,
title={String Sanitization Under Edit Distance: Improved and Generalized},
author={Takuya Mieno and Solon P. Pissis and Leen Stougie and Michelle Sweering},
booktitle={CPM},
year={2021}
}
• Published in CPM 16 July 2020
• Computer Science
Let $W$ be a string of length $n$ over an alphabet $\Sigma$, $k$ be a positive integer, and $\mathcal{S}$ be a set of length-$k$ substrings of $W$. The ETFS problem asks us to construct a string $X_{\mathrm{ED}}$ such that: (i) no string of $\mathcal{S}$ occurs in $X_{\mathrm{ED}}$; (ii) the order of all other length-$k$ substrings over $\Sigma$ is the same in $W$ and in $X_{\mathrm{ED}}$; and (iii) $X_{\mathrm{ED}}$ has minimal edit distance to $W$. When $W$ represents an individual's data and…
1 Citations

## Figures from this paper

### Matching Patterns with Variables Under Edit Distance

• Computer Science
SPIRE
• 2022
The problem of matching patterns with variables under edit distance is considered, but it is shown that the problem becomes intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.

## References

SHOWING 1-10 OF 41 REFERENCES

### String Sanitization Under Edit Distance

• Computer Science
CPM
• 2020
An algorithm to solve ETFS in (kn²) time, which improves on the state of the art by a factor of |Σ| and shows that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false.

### All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings

This work builds a data structure that supports O(mn log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices, and presents a simple O(n2 log n) time and $\Theta(n^2)$ space algorithm to find all approximate tandem repeats xy within a string of size n.

### A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

• Computer Science
CPM
• 2018
This work extends the classic result of Masek and Paterson which computes the edit distance between two strings in O(m2/ logm) time to remove the dependence on ψ even when edits have arbitrary costs from a penalty matrix and shows a new algorithm for the fundamental problem of one-against-many banded alignment.

### Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping

• Computer Science
2015 IEEE 56th Annual Symposium on Foundations of Computer Science
• 2015
A framework for proving quadratic-time hardness of similarity measures is introduced, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability, and conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.

### Combinatorial Algorithms for String Sanitization

• Computer Science
ACM Trans. Knowl. Discov. Data
• 2021
A heuristic, MCSR-ALGO, is proposed, which replaces letters in the strings output by the algorithms with carefully selected letters, so that sensitive patterns are not reinstated, implausible patterns areNot introduced, and occurrences of spurious patterns are prevented.

### Approximate matching of regular expressions.

• Computer Science
Bulletin of mathematical biology
• 1989

### On the sorting-complexity of suffix tree construction

• Computer Science
JACM
• 2000
A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.

### A Linear-Time Algorithm for Seeds Computation

• Computer Science
SODA
• 2012
A linear-time algorithm computing a linear-size representation of all seeds of a word that can easily derive the shortest seed and the number of seeds from the authors' representation and improves upon a previous O(n log n)-time algorithm.