6. Conclusions and Future Work

Abstract

Temporal video segmentation using unsupervised clustering and semantic object tracking",A fuzzy k-nearest neighbor algorithm", IEEE Transactions on systems man, cybernetics. 12. Internal object-part structural information was not used in these tests, but detailed results are provided in [14]. The results above suggest that the framework presented here is suitable for performing visual classification when the elements in the class present a strong consistency in terms of regions (e.g., elephants). In cases where that consistency is limited to a very small subset of the conceptual definition of a class (e.g., skater), we say that our approach is suitable for detection-The Visual Apprentice learns from the examples it is provided, therefore, the definition of skater in the system is limited to skaters with very similar visual characteristics to the one in the examples. In other classes, such as cocktails, for instance, there is a wide variation in terms of the color, size and location of the regions. Our approach is not suitable in such cases. We have presented a different approach to content-based retrieval in which distinct users can utilize an object definition hierarchy to define their own visual classes. Classes are defined in The Visual Apprentice, an implementation of our approach, which learns classifiers that can be used to automatically organize the contents of a database. Our novel framework for classification of visual information incorporates multiple fuzzy classifiers and learning techniques according to the levels of the object definition hierarchy: (1) region, (2) perceptual, (3) object-part, (4) object and (5) scene. The use of generic and task-specific classifiers is possible, and our approach is flexible since each user can have his own visual classifiers. Future research directions include placing a feedback loop in the The Visual Apprentice so that learning is performed incrementally, allowing the user to place additional restrictions on the visual classes. This will require more sophisticated interface tools. In addition, we are working on integrating visual classification results with information provided by other media accompanying the visual information. In particular, we are working on using The Visual Apprentice to build classifiers for news images/video making use of text information. Motion information is also being included in our model and we plan to integrate the work from [26] to assist labeling of video objects. Our system forms part of Columbia's Mpeg-7 testbed [20].

11 Figures and Tables

Showing 1-10 of 28 references

Syntax, Semantics and Visual Information: A Conceptual Representation for Indexing

  • A Jaimes, S.-F Chang
  • 1999

The Visual Apprentice: A Multiple-level Learning and Classification Framework

  • A Jaimes, S.-F Chang
  • 1999
2 Excerpts