• Corpus ID: 1597636

Learnable Pooling Regions for Image Classification

  title={Learnable Pooling Regions for Image Classification},
  author={Mateusz Malinowski and Mario Fritz},
Biologically inspired, from the early HMAX model to Spatial Pyramid Matching, pooling has played an important role in visual recognition pipelines. Spatial pooling, by grouping of local codes, equips these methods with a certain degree of robustness to translation and deformation yet preserving important spatial information. Despite the predominance of this approach in current recognition systems, we have seen little progress to fully adapt the pooling strategy to the task at hand. This paper… 

Figures and Tables from this paper

Learning Smooth Pooling Regions for Visual Recognition
A flexible parameterization of the spatial pooling step is proposed and a smoothness regularization term is investigated that in conjuncture with an efficient learning scheme makes learning scalable.
Beyond spatial pooling: Fine-grained representation learning in multiple domains
This paper forms a probabilistic framework for analyzing the performance of pooling, and applies multiple scales of filters coupled with different pooling granularities, and makes use of color as an additional pooling domain, thereby reducing the sensitivity to spatial deformations.
A Closed-Form Learned Pooling for Deep Classification Networks
A way to enable CNNs to learn different pooling weights for each pixel location is proposed by introducing an extended definition of a pooling operator that can learn a strict super-set of what can be learned by average pooling or convolutions.
Latent Model Ensemble with Auto-localization
A novel latent CNN framework is proposed, which treats the most discriminate region as a latent variable and outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFar- 100, MNIST and PASCAL VOC 2007 Classification dataset.
Fine-grained representation learning in convolutional autoencoders
Experimental results on two independent benchmark datasets demonstrate that the representation learning law could guide CAEs to extract better fine-grained features and performs better in multiclass classification task.
Learning Bag-of-Features Pooling for Deep Convolutional Neural Networks
  • N. Passalis, A. Tefas
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
The proposed approach, called Convolutional BoF (CBoF), uses RBF neurons to quantize the information extracted from the convolutional layers and it is able to natively classify images of various sizes as well as to significantly reduce the number of parameters in the network.
A Biologically Inspired Deep CNN Model
By making an analogy between the proposed DCNN model and the human visual cortex, many critical design choices of the proposed model can be determined with some simple calculations.
Improving DCNN Performance with Sparse Category-Selective Objective Function
Inspired by the category-selective property of the neuron population in the IT layer of the human visual cortex, the Sparse Category-Selective Objective Function (SCSOF) is proposed to modulate the neuron outputs of the top DCNN layer to be category selective.
Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition
This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint.


Geometric ℓp-norm feature pooling for image classification
Modern visual classification models generally include a feature pooling step, which aggregates local features over the region of interest into a statistic through a certain spatial pooling operation.
Beyond spatial pyramids: Receptive field learning for pooled image features
This paper shows that learning more adaptive receptive fields increases performance even with a significantly smaller codebook size at the coding layer, and adopts the idea of over-completeness to learn the optimal pooling parameters.
Convolutional Deep Belief Networks on CIFAR-10
Using a combination of locally-connected convolutional units and globally-connected units, as well as a few tricks to reduce the e ects of over tting, the DBN achieves state-of-the-art performance in the classi cation task of the CIFAR-10 subset of the tiny images dataset.
Linear spatial pyramid matching using sparse coding for image classification
An extension of the SPM method is developed, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and a linear SPM kernel based on SIFT sparse codes is proposed, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
Building high-level features using large scale unsupervised learning
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
The results show that large numbers of hidden nodes and dense feature extraction are critical to achieving high performance—so critical, in fact, that when these parameters are pushed to their limits, they achieve state-of-the-art performance on both CIFAR-10 and NORB using only a single layer of features.
Multi-column deep neural networks for image classification
On the very competitive MNIST handwriting benchmark, this method is the first to achieve near-human performance and improves the state-of-the-art on a plethora of common image classification benchmarks.
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
An unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions that alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
This work investigates the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of training and encoding in a controlled way and shows not only that it can use fast VQ algorithms for training, but that they can just as well use randomly chosen exemplars from the training set.
Modeling pixel means and covariances using factorized third-order boltzmann machines
This approach provides a probabilistic framework for the widely used simple-cell complex-cell architecture, it produces very realistic samples of natural images and it extracts features that yield state-of-the-art recognition accuracy on the challenging CIFAR 10 dataset.