Stride and Translation Invariance in CNNs

  title={Stride and Translation Invariance in CNNs},
  author={Coenraad Mouton and Johannes C. Myburgh and Marelie Hattingh Davel},
Convolutional Neural Networks have become the standard for image classification tasks, however, these architectures are not invariant to translations of the input image. This lack of invariance is attributed to the use of stride which ignores the sampling theorem, and fully connected layers which lack spatial reasoning. We show that stride can greatly benefit translation invariance given that it is combined with sufficient similarity between neighbouring pixels, a characteristic which we refer… Expand

Figures and Tables from this paper


Quantifying Translation-Invariance in Convolutional Neural Networks
This analysis identifies training data augmentation as the most important factor in obtaining translation-invariant representations of images using convolutional neural networks. Expand
Why do deep convolutional networks generalize so poorly to small image transformations?
The results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved. Expand
Making Convolutional Networks Shift-Invariant Again
This work demonstrates that anti-aliasing by low-pass filtering before downsampling, a classical signal processing technique has been undeservingly overlooked in modern deep networks, is compatible with existing architectural components, such as max-pooling and strided-convolution. Expand
Understanding image representations by measuring their equivariance and equivalence
  • Karel Lenc, A. Vedaldi
  • Computer Science, Mathematics
  • 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
Three key mathematical properties of representations: equivariance, invariance, and equivalence are investigated and applied to popular representations to reveal insightful aspects of their structure, including clarifying at which layers in a CNN certain geometric invariances are achieved. Expand
Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition
The aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks, and empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Expand
Convolutional neural networks at constrained time cost
  • Kaiming He, Jian Sun
  • Computer Science
  • 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
This paper investigates the accuracy of CNNs under constrained time cost, and presents an architecture that achieves very competitive accuracy in the ImageNet dataset, yet is 20% faster than “AlexNet” [14] (16.0% top-5 error, 10-view test). Expand
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Dynamic Routing Between Capsules
It is shown that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. Expand