Learning String-Edit Distance

@inproceedings{Ristad1998LearningSD,
  title={Learning String-Edit Distance},
  author={Eric Sven Ristad and Peter N. Yianilos},
  booktitle={IEEE Trans. Pattern Anal. Mach. Intell.},
  year={1998}
}
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the… 

Figures and Tables from this paper

Learning String Edit Distance 1
TLDR
The stochastic model allows us to learn a string edit distance function from a corpus of examples and is applicable to any string classiication problem that may be solved using a similarity function against a database of labeled prototypes.
Neural String Edit Distance
TLDR
The original expectation-maximization learned edit distance algorithm is modified into a differentiable loss function, allowing it to integrate into a neural network provid-ing a contextual representation of the input.
Learning Conditional Transducers for Estimating the Distribution of String Edit Costs
TLDR
The Edit Distance is focused on and an algorithm to learn the costs of the primitive edit operations is proposed to overcome the previously mentioned drawbacks by automatically learning the primitive cut costs, rather than hand-tuning them for each domain.
A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer
TLDR
This paper proposes an algorithm to learn the costs of the primitive edit operations of the Levenshtein edit-distance, and shows through experiments that this method allows us to design cost functions that depend on the string context where the edit operations are used.
Edit-distance of weighted automata
TLDR
The algorithm can be extended to provide an edit-distance automaton useful for rescoring and other post-processing purposes in the context of large-vocabulary speech recognition and the algorithm for computing exactly the edit- distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems.
Edit-Distance Of Weighted Automata: General Definitions And Algorithms
TLDR
The edit-distance of two distributions over strings is defined and algorithms for computing it when these distributions are given by automata are presented, including the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm.
Learning stochastic edit distance: Application in handwritten character recognition
A Stochastic Approach to Median String Computation
TLDR
The algorithm is based on the extension of the string structure to multistrings (strings of stochastic vectors where each element represents the probability of each symbol) to allow the use of the Expectation Maximization technique.
Using Learned Conditional Distributions as Edit Distance
TLDR
This article aims at learning an unbiased stochastic edit distance, in the form of a finite-state transducer, from a corpus of (input,output) pairs of strings, which can be very useful in pattern recognition particularly in the presence of noisy data.
On using parametric string distances and vector quantization in designing syntactic pattern recognition systems
  • B. Oommen, R. Loke
  • Computer Science
    1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation
  • 1997
TLDR
It is shown how the classifier can be trained to get the optimal parametric distance using vector quantization in the meta-space, and report classification results after such a training process.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
A Faster Algorithm Computing String Edit Distances
The String-to-String Correction Problem
TLDR
An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
Parametric string edit distance and its application to pattern recognition
TLDR
A generalized version of the string matching algorithm by Wagner and Fischer (1974) is proposed, based on a parametrization of the edit cost, which computes their edit distance in terms of the parameter /spl tau/.
Constrained string editing
Computation of Normalized Edit Distance and Applications
TLDR
Experiments in hand-written digit recognition are presented, revealing that the normalized edit distance consistently provides better results than both unnormalized or post-normalized classical edit distances.
Approximate String Matching
Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The
Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors
TLDR
This paper develops a rigorous model, , for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors and derives a technique by which Pr[Y¦U], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming.
Topics in computational hidden state modeling
TLDR
FGM framework, algorithms, and data structures describe hidden Markov models, stochastic context free grammars, and many other conventional similar models while providing a natural way for computer scientists to learn and reason about them and their many variations.
Techniques for automatically correcting words in text
TLDR
Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.
Design of a linguistic statistical decoder for the recognition of continuous speech
TLDR
This paper describes the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech and describes a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP.
...
1
2
3
4
5
...