Discovering objects and their location in images

  title={Discovering objects and their location in images},
  author={Josef Sivic and Bryan C. Russell and Alexei A. Efros and Andrew Zisserman and William T. Freeman},
  journal={Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1},
  pages={370-377 Vol. 1}
We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a… 
Discovering objects and their location in images with Latent Dirichlet Allocation
We seek to discover object categories and their locations in a set of unlabelled images. We achieve this using probabilistic models developed in the text understanding community to discover
Unsupervised discovery of visual object class hierarchies
This work proposes to group visual objects using a multi-layer hierarchy tree that is based on common visual elements by adapting to the visual domain the generative hierarchical latent Dirichlet allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text.
Decomposition, discovery and detection of visual categories using topic models
  • Mario Fritz, B. Schiele
  • Computer Science
    2008 IEEE Conference on Computer Vision and Pattern Recognition
  • 2008
The proposed speed-ups make the system scale to large databases and improve the state-of-the-art in unsupervised learning as well as supervised detection on the challenging PASCALpsila06 multi-class detection tasks for several categories.
Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation
This annotation task combines a step of supervised classification of patches of the large image and the integration of the spatial information between these patches, using the maximum-likelihood method.
A Thousand Words in a Scene
This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate
Unsupervised object category discovery via information bottleneck method
The Information Bottleneck method is a promising technique for discovering the hidden semantic of images, and is superior to the state-of-the-art unsupervised object category discovery methods.
Graph-Based Object Class Discovery from Images with Multiple Objects
A new unsupervised graph-based object discovery algorithm that treats images with multiple objects by clustering the local features without specifying the number of clusters and acquires object models as frequent subgraph structures defined by a set of co-occurring edges which describe the spatial relation between local features.
Latent Semantics Local Distribution for CRF-based Image Semantic Segmentation
This paper proposes a method that combines a region-based probabilistic graphical model that builds on the recent success of Conditional Random Fields (CRFs) in the problem of semantic segmentation, with a salient-points-based bagsof-words paradigm.
Discovering objects in images and videos
This thesis presents a novel way of scene analysis in images and videos. Traditional scene analysis using object detection involves a lot of human labor for labeling the images, and also has the
Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images
  • David Liu, Tsuhan Chen
  • Mathematics, Computer Science
    2007 IEEE 11th International Conference on Computer Vision
  • 2007
A new approach is presented that employs correspondences between images to provide information about object configuration, which enhances the reliability of object localization and categorization, and shows improved categorization and localization performance on real and synthetic data.


Discovering object categories in image collections
Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the
Matching Words and Pictures
A new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text, is presented, and a number of models for the joint distribution of image regions and words are developed, including several which explicitly learn the correspondence between regions and Words.
Hidden semantic concept discovery in region based image retrieval
This work addresses content based image retrieval (CBIR), focusing on developing a hidden semantic concept discovery methodology to address effective semantics-intensive image retrieval. In our
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.
A Bayesian hierarchical model for learning natural scene categories
  • Li Fei-Fei, P. Perona
  • Computer Science
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
  • 2005
This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories
The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.
Video Google: a text retrieval approach to object matching in videos
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint
Object class recognition by unsupervised scale-invariant learning
  • R. Fergus, P. Perona, Andrew Zisserman
  • Mathematics, Computer Science
    2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
  • 2003
The flexible nature of the model is demonstrated by excellent results over a range of datasets including geometrically constrained classes (e.g. faces, cars) and flexible objects (such as animals).
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
  • A. Oliva, A. Torralba
  • Mathematics, Computer Science
    International Journal of Computer Vision
  • 2004
The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Unsupervised Learning of Models for Recognition
A method to learn object class models from unlabeled and unsegmented cluttered cluttered scenes for the purpose of visual object recognition that achieves very good classification results on human faces and rear views of cars.