Locating the Whole Pattern Is Better than Locating Its Pieces: a Geometric Explanation of an Empirical Phenomenon

Abstract

In many practical problems, we must find a pattern in an image. For situations in which the desired pattern consists of several simple components, the traditional approach is first to look for such components, and then to see whether the relative locations of these components are consistent with the pattern. Recent experiments have shown that a much more efficient pattern recognition can be achieved if we look for the whole pattern (without decomposing it first). In this paper, we give a simple geometric explanation of this empirical fact. The practical problem. In many pattern recognition problems, we must locate a known simple pattern in a complicated black-andwhite image. For example: • in automatic analysis of electronic schemes, we must locate symbols of standard electronic components (such as −| |−); • in text recognition, we must find letters, • similar pattern matching problems arise in satellite imaging, etc. Traditional approach. Most traditional methods for solving this problem are based on the fact that the desired pattern consists of simple geometric components (straight line intervals, arcs, etc.) For example, the above symbol for capacitor consists of four straight line intervals −, |, |, and −. Traditional methods consist of two stages: • first, we try to locate each component of the desired pattern; • after all components are located, we check that their relative locations are close to the relative locations of these components in the desired pattern (to be more precise, we check that the difference between the observed and desired relative locations is within the limits set by the observation inaccuracy of component location). A new approach turns out to be better. The authors of [Murshed Bortolozzi 1998] propose to recognize the entire pattern (symbol) without first decomposing it into simple components. The resulting algorithm requires more computation time, but leads to much better recognition: namely, if we set up the parameters of both methods in such a way as to avoid false negatives (unrecognized symbols), then for the new method, the number of false positives (false recognitions of a pattern) is much smaller than for the traditional methods. In this paper, we give a simple geometric explanation of this empirical phenomenon. Geometric reformulation of the problem. We start with a sample pattern P which consists of several components Pi: P = P1 ∪ . . . ∪ Pn. Without losing generality, we can assume that P is a compact set. In the actual image, the actual pattern may be shifted relative to the standard one, so this actual pattern has the form TP for some shift (translation) T . For simplicity, we will assume that the pattern is surrounded by an empty space, i.e. (at least locally): • either the actual image I coincides with the shifted pattern TP , in which case the pattern is present, • or the actual image is different from the shifted pattern, in which case the pattern is not here. Description of measurement inaccuracy. Due to measurement inaccuracy, the observed image Ĩ is, in general, slightly different from the actual image I. Namely, due to this inaccuracy, for each point p from the original image, the corresponding observed point p̃ may be different from p. The observation inaccuracy can be characterized by the largest possible distance d(p̃, p) between the actual and the observed points. If this inaccuracy is ε > 0, this means that: • every point from I is ε-close to some point from Ĩ, and • every point from Ĩ is ε-close to some point from I. In other words, this means that the Hausdorff distance between the actual and observed images does not exceed ε: dH(I, Ĩ) ≤ ε. New approach reformulated in geometric terms. If the desired pattern P is present in the image, i.e., if I = TP , then: There exists a T for which dH(TP, Ĩ) ≤ ε. (1) Vice versa, if for an observed image Ĩ, this condition holds, this means that there exists a pattern TP which is consistent with the observed image, and therefore, it is quite possible that the observed image contains a desired pattern. Thus, the condition (1) expresses the fact that the observed image Ĩ is consistent with the assumption that the actual image contains the desired pattern. Hence, if we want to avoid false negatives (i.e., un-recognized patterns), we must check the condition (1). This is what the new approach does. How good is the new approach. • If the result of the new approach is negative, this means that the observed image does not contain the pattern; • on the other hand, if the result of this approach is positive, this means that it is possible that the observed image contains the pattern (i.e., that the observed image is consistent with the assumption that it is actually the shifted standard pattern). We cannot get any better than that. Of course, due to the observation inaccuracy, without additional assumptions, we can never guarantee that the image is actually the desired pattern: the actual image could as well be a slightly distorted pattern, and because of the observation inaccuracy, we do not notice this distortion. With this comment in mind, we can see that we cannot get any better pattern recognition than by using the new approach. Traditional approach reformulated in geometric terms. In traditional approach, we first look for components, i.e., we look for the possibility for representing the observed image Ĩ as a union of n sets Ĩ1, . . . , Ĩn such that for every i, the i-th component Ĩi of the observed image is consistent with it being actually a shift TiPi of i-th component Pi of the desired pattern P . Similarly to the above argument, we can conclude that the possibility for Ĩi to be actually a shift of Pi can be described as follows: There exists a Ti for which dH(TiPi, Ĩi) ≤ ε. (2) Therefore, if we want to avoid false negatives (i.e., if we do not want un-recognized patterns), we should look for a partition Ĩ = Ĩ1 ∪ . . . ∪ Ĩn which satisfies the property (2) for all i = 1, . . . , n. This is the first stage of the traditional approach. As a result of this stage: • If such a partition is impossible, then, based on the observation Ĩ, we can conclude that the actual (unknown) image I does not coincide with the desired pattern, and therefore, the desired pattern is not present here. • On the other hand, if the partition is possible, i.e., if Ĩ = Ĩ1 ∪ . . .∪ Ĩn with dH(Ĩi, TiPi) ≤ ε for some shifts Ti, then it is not necessarily true that Ĩ can contain the desired pattern: it may happen that the shifts are too far away from each other. If the actual image I is indeed a shift of the standard pattern P , i.e., if I = TP for some T , then, due to possible observation inaccuracy, dH(Ĩi, TPi) ≤ ε. Based on the observed components Ĩi, we select shifts Ti for which dH(Ĩi, TiPi) ≤ ε. Therefore, we can conclude that if the actual image is indeed the shift of the standard pattern, then dH(TiPi, TPi) ≤ dH(TiPi, Ĩi) + dH(Ĩi, TPi) ≤ 2ε. The Hausdorff distance between two shifts TiPi and TPi of the same set is equal to the distance between d(Ti, T ) these shifts, i.e., to the Euclidean distance between the vectors corresponding to these shifts. So, we can conclude that if the pattern is present, then all the shifts Ti generated on the first stage should be 2ε-close to some (unknown) shift T . This means, in turn, that for every i and j, we have d(Ti, Tj) ≤ d(Ti, T ) + d(T, Tj) ≤ 4ε. So, on the second stage of the traditional method, we check the following condition: d(Ti, Tj) ≤ 4ε for all i and j. (3) How good is the traditional approach. • If the result of traditional approach is negative, this means that the observed image does not contain the pattern; • on the other hand, if the result of this approach is positive, this does not necessarily mean that it is possible that the observed image contains the pattern; it is quite possible that the observed image is inconsistent with the assumption that it is actually the shifted standard pattern. Let us give a simple example explaining why this can happen. Let us consider a 2-component pattern P = | consisting of a vertical component P1 of length 1 and a horizontal component P2 of the same length 1. If we take the angle of P as the origin (0, 0) of the coordinate system, then P1 = {0} × [0, 1] and P2 = [0, 1] × {0}. Let us take Ĩ = Ĩ1 ∪ Ĩ2, where Ĩ1 = {−2ε} × [0, 1] and Ĩ2 = [2ε, 1 + 2ε]× {0}. • For this image, the traditional approach can lead to a positive answer: indeed, here: • dH(Ĩ1, T1P1) ≤ ε for T1 = (−ε, 0), • dH(Ĩ2, T2P2) ≤ ε for T2 = (ε, 0), and • d(T1, T2) = 2ε < 4ε. • On the other hand, the image Ĩ is not consistent with the pattern P because, as one can easily see, dH(Ĩ , TP ) ≥ 2ε for all possible shifts T . So, the traditional approach is indeed not perfect. Open problem. In the above text, we simply gave an example of when a traditional method leads to unnecessary false positives. It is desirable to have a general numerical estimate of the quality of the traditional approach. In precise terms, we have the following problem: We have n compact sets Pi, and n compact sets Ĩi. We know that for every i from 1 to n, dH(Ĩi, TiPi) ≤ ε for some shifts Ti for which d(Ti, Tj) ≤ 4ε for all i and j. What is the smallest possible value of th Hausdorff distance dH(Ĩ , TP ) between the union Ĩ = Ĩ1 ∪ . . . ∪ Ĩn and a shift TP of the union P = P1 ∪ . . . ∪ Pn? Our guess is that this smallest possible value is 3ε. Our argument in favor of this guess is as follows: it looks like, since the diameter of the set {T1, . . . , Tn} is ≤ 4ε, that its radius will be ≤ 2ε, i.e., that there should exist a shift T for which d(Ti, T ) ≤ 2ε for all i. For this shift T , we have dH(Ĩi, TPi) ≤ dH(Ĩi, TiPi) + dH(TiPi, TPi) ≤ ε+ d(Ti, T ) ≤ ε+ 2ε = 3ε. Acknowledgments. This work was supported in part by NASA under cooperative agreement NCC5-209, by NSF grants No. DUE9750858 and CDA-9522207, by United Space Alliance, grant No. NAS 9-20000 (PWO C0C67713A6), by the Future Aerospace Science and Technology Program (FAST) Center for Structural Integrity of Aerospace Systems, effort sponsored by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant number F49620-95-1-0518, and by the National Security Agency under Grant No. MDA904-98-1-0564.

Cite this paper

@inproceedings{Starks1999LocatingTW, title={Locating the Whole Pattern Is Better than Locating Its Pieces: a Geometric Explanation of an Empirical Phenomenon}, author={Scott A. Starks and Vladik Kreinovich}, year={1999} }