• Corpus ID: 14294054

Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation

  title={Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation},
  author={Benjamin Klein and Guy Lev and Gil Sadeh and Lior Wolf},
In the traditional object recognition pipeline, descriptors are densely sampled over an image, pooled into a high dimensional non-linear representation and then passed to a classifier. In recent years, Fisher Vectors have proven empirically to be the leading representation for a large variety of applications. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). Motivated by the assumption… 

Figures and Tables from this paper

A New Probabilistic Representation of Color Image Pixels and Its Applications

Experimental results show that the integration of the proposed pixel-wise similarity in dense image-descriptor construction yields improved peak signal to noise ratio performance and higher tracking accuracy in the multi-layered motion estimation problem and the proposed similarity measures give the best performance in terms of all quantitative measurements in the unsupervised superpixel-based image segmentation of the MSRC and BSD300 datasets.

Backpropagation Training for Fisher Vectors within Neural Networks

This work proposes a framework to jointly learn the representation of original features, FV parameters and parameters of the classifier in the style of traditional neural networks, and demonstrates that FV can be embedded into neural networks at arbitrary positions, allowing end-to-end training with back-propagation.

Probabilistic Embeddings for Cross-Modal Retrieval

It is argued that deterministic functions are not sufficiently powerful to capture one-to-many correspondences and proposed Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space.

Unsupervised multi-view representation learning with proximity guided representation and generalized canonical correlation analysis

The proposed MRL could be one of the first unsupervised multi-view representation learning models that work in proximity guided dynamic routing and GCCA modes and exploits the maximum correlations among multiple views based on Generalized Canonical Correlation Analysis.

Neural ranking for automatic image annotation

The approach integrates learning to rank algorithms and nearest-neighbor based models, including TagProp and 2PKNN, and inherits their advantages and achieves better or comparable performance compared with the state-of-the-art methods on four challenging benchmarks.

A Neighbor-aware Approach for Image-text Matching

  • Chunxiao LiuZhendong MaoW. ZangBin Wang
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
A neighborhood-aware network to image-text matching is proposed where an intra-attention module and neighbor-aware ranking loss are proposed to jointly distinguish data with different semantics, more importantly, semantic unrelated data in a neighborhood can be distinguished.

Learning Visual N-Grams from Web Data

This paper develops visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image, and demonstrates the merits of the models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.

Efficient Algorithms for Estimating the Parameters of Mixed Linear Regression Models

A new algorithm based on combining the alternating direction method of multipliers (ADMM) with EM algorithm idea is proposed that outperforms the EM algorithm in statistical accuracy and computational time in non-Gaussian noise case.

Training Visual-Semantic Embedding Network for Boosting Automatic Image Annotation

This paper trains neural networks using visual and semantic ranking loss to learn visual-semantic embedding, which can be easily applied to nearest-neighbor based models to boost their performance on image auto-annotation.

Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation

This paper proposes Query-Adaptive R-CNN, a simple extension of Faster R- CNN adapted to open-vocabulary queries, and proposes negative phrase augmentation (NPA) to mine hard negative samples which are visually similar to the query and at the same time semantically mutually exclusive of the query.



Image Classification with the Fisher Vector: Theory and Practice

This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.

Fisher Kernels on Visual Vocabularies for Image Categorization

  • F. PerronninC. Dance
  • Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.

Heavy-tailed Distances for Gradient Based Image Descriptors

This paper advocates for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution, and shows significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost.

Improving the Fisher Kernel for Large-Scale Image Classification

In an evaluation involving hundreds of thousands of training images, it is shown that classifiers learned on Flickr groups perform surprisingly well and that they can complement classifier learned on more carefully annotated datasets.

Discovering objects and their location in images

This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).

Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters

A gradient descent based learning algorithm is introduced that, in contrast to other feature learning techniques, is not just derived from intuition or biological analogy, but has a theoretical justification in the framework of statistical learning theory.

Large-scale image retrieval with compressed Fisher vectors

This article shows why the Fisher representation is well-suited to the retrieval problem: it describes an image by what makes it different from other images, and why it should be compressed to reduce their memory footprint and speed-up the retrieval.

PCA-SIFT: a more distinctive representation for local image descriptors

  • Yan KeR. Sukthankar
  • Computer Science
    Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
  • 2004
This paper examines (and improves upon) the local image descriptor used by SIFT, and demonstrates that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation.

Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora

  • R. SocherLi Fei-Fei
  • Computer Science
    2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • 2010
A semi-supervised model which segments and annotates images using very few labeled images and a large unaligned text corpus to relate image regions to text labels and outperforms the state-of-the-art in annotation.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.