Viewpoint Invariant Image Retrieval for Context in Urban Environments


We outline work in progress towards a scalable system capable of quickly identifying buildings and other outdoor urban structures from photographs. Salient characteristics of the skyline are extracted and correlated using a minimal edit distance metric to identify a subset of potential matches in a database. This subset is then refined using a slower but more precise affine invariant feature comparison to identify the best match. Our system is envisaged as an alternative to GPS where users may not have access to expensive GPS equipment, orientation may be a complicating factor (several landmarks present at a single location), or a reliable GPS fix. 1 Landmark and Skyline Recognition Recently wide-baseline matching techniques have been applied to landmark recognition [2], but require both a calibrated camera (complicating deployment on heterogeneous clients) and a RANSAC search for each landmark (computationally expensive for large databases). We are developing a scalable two-step matching process, promising query-response times in the order of seconds over large databases, and working within the constraints of modern mobile devices. We maintain fast lookup times over large databases by hashing images according to the salient characteristics of their skyline. Using a standard sky detector we obtain a characteristic signal by differencing points on the skyline with their best fit regression line. The signal is robust to translation, rotation and also to scale following a normalisation step. To compensate for adverse effects of skyline occlusion and foreshortening we employ a simple coding system to record salient points on the skyline. Our salience measure identifies points of high curvature that remain stable over multiple morphological scales. We encode points sequentially using a 10 symbol alphabet (Figure 1) that identifies the direction of curvature relative to the regression line; so yielding the skyline’s hash-key. A default key is used when no sky is visible. We maintain a database of urban landmarks; several training images are associated with each landmark. Each image is stored with a pre-computed hash-key. On receipt of a query image, a hash-key is generated and compared against keys in the database using a modified Levenshtein edit distance. Images with distances within a threshold are then examined using sparse affine invariant feature matching to identify the most relevant landmark. These features are computed a priori A B

1 Figure or Table

Cite this paper

@inproceedings{CollomosseViewpointII, title={Viewpoint Invariant Image Retrieval for Context in Urban Environments}, author={John Philip Collomosse and Kharsim Al Mosawi} }