# Learning String-Edit Distance

@inproceedings{Ristad1998LearningSD, title={Learning String-Edit Distance}, author={Eric Sven Ristad and Peter N. Yianilos}, booktitle={IEEE Trans. Pattern Anal. Mach. Intell.}, year={1998} }

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the…

## Figures and Tables from this paper

## 875 Citations

Learning String Edit Distance 1

- Computer Science
- 1997

The stochastic model allows us to learn a string edit distance function from a corpus of examples and is applicable to any string classiication problem that may be solved using a similarity function against a database of labeled prototypes.

Neural String Edit Distance

- Computer ScienceSPNLP
- 2022

The original expectation-maximization learned edit distance algorithm is modified into a differentiable loss function, allowing it to integrate into a neural network provid-ing a contextual representation of the input.

Learning Conditional Transducers for Estimating the Distribution of String Edit Costs

- Computer Science
- 2006

The Edit Distance is focused on and an algorithm to learn the costs of the primitive edit operations is proposed to overcome the previously mentioned drawbacks by automatically learning the primitive cut costs, rather than hand-tuning them for each domain.

A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

- Computer ScienceICGI
- 2006

This paper proposes an algorithm to learn the costs of the primitive edit operations of the Levenshtein edit-distance, and shows through experiments that this method allows us to design cost functions that depend on the string context where the edit operations are used.

Edit-distance of weighted automata

- Computer ScienceCIAA'02
- 2002

The algorithm can be extended to provide an edit-distance automaton useful for rescoring and other post-processing purposes in the context of large-vocabulary speech recognition and the algorithm for computing exactly the edit- distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems.

Edit-Distance Of Weighted Automata: General Definitions And Algorithms

- Computer ScienceInt. J. Found. Comput. Sci.
- 2003

The edit-distance of two distributions over strings is defined and algorithms for computing it when these distributions are given by automata are presented, including the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm.

Learning stochastic edit distance: Application in handwritten character recognition

- Computer SciencePattern Recognit.
- 2006

A Stochastic Approach to Median String Computation

- Computer ScienceSSPR/SPR
- 2008

The algorithm is based on the extension of the string structure to multistrings (strings of stochastic vectors where each element represents the probability of each symbol) to allow the use of the Expectation Maximization technique.

Using Learned Conditional Distributions as Edit Distance

- Computer ScienceSSPR/SPR
- 2006

This article aims at learning an unbiased stochastic edit distance, in the form of a finite-state transducer, from a corpus of (input,output) pairs of strings, which can be very useful in pattern recognition particularly in the presence of noisy data.

On using parametric string distances and vector quantization in designing syntactic pattern recognition systems

- Computer Science1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation
- 1997

It is shown how the classifier can be trained to get the optimal parametric distance using vector quantization in the meta-space, and report classification results after such a training process.

## References

SHOWING 1-10 OF 49 REFERENCES

The String-to-String Correction Problem

- Mathematics, EducationJACM
- 1974

An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.

Parametric string edit distance and its application to pattern recognition

- Computer ScienceIEEE Trans. Syst. Man Cybern.
- 1995

A generalized version of the string matching algorithm by Wagner and Fischer (1974) is proposed, based on a parametrization of the edit cost, which computes their edit distance in terms of the parameter /spl tau/.

Computation of Normalized Edit Distance and Applications

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 1993

Experiments in hand-written digit recognition are presented, revealing that the normalized edit distance consistently provides better results than both unnormalized or post-normalized classical edit distances.

Approximate String Matching

- Computer ScienceCSUR
- 1980

Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The…

Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors

- Computer ScienceSSPR
- 1996

This paper develops a rigorous model, , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors and derives a technique by which Pr[Y¦U], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming.

Topics in computational hidden state modeling

- Computer Science
- 1997

FGM framework, algorithms, and data structures describe hidden Markov models, stochastic context free grammars, and many other conventional similar models while providing a natural way for computer scientists to learn and reason about them and their many variations.

Techniques for automatically correcting words in text

- Computer ScienceCSUR
- 1992

Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.

Design of a linguistic statistical decoder for the recognition of continuous speech

- Computer Science, LinguisticsIEEE Trans. Inf. Theory
- 1975

This paper describes the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech and describes a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP.