We demonstrate a visual recognition system operating on a hand-held device, with the help of an efficient and robust feature tracking and an object recognition mechanism that can be used for interactive mobile applications. In our recognition system, corner features  are detected from captured video frames in a multi-scale image pyramid, and are tracked between consecutive frames efficiently. In order to perform object recognition, local descriptors are calculated on the tracked features, and quantized using a vocabulary tree . For each object, a bag-of-words model is learned from multiple views. The learned objects are recognized by computing the ranking score for the set of features in a single video frame. Our feature tracking algorithm and local descriptors are different than the Lucas-Kanade algorithm in image pyramid  or the SIFT descriptor , however improving the efficiency and accuracy. For our implementation on a mobile phone, we used an iPhone 3GS with a 600MHz ARM chip CPU. The video frame is captured from a camera preview screen at a rate of 15 frames per second using the public API. The task of object recognition on a mobile phone runs at around 7 frames per second, including the feature tracking and descriptor calculation. Figure 1. Visual learning and recognition on a hand-held. The system automatically detects and tracks salient features; the user selects an object for learning; a model is constructed, stored, and later used for recognition.