Corpus ID: 237353190

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

  title={Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions},
  author={Yang Wu and Dingheng Wang and Xiaotong Lu and Fan Yang and Guoqi Li and Weisheng Dong and Jianbo Shi},
  • Yang Wu, Dingheng Wang, +4 authors Jianbo Shi
  • Published 30 August 2021
  • Computer Science
  • ArXiv
Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new… Expand


Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliverExpand
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
A comprehensive survey of advances in texture representation over the last two decades is presented covering different aspects of the research, including benchmark datasets and state of the art results. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks. Expand
Do Better ImageNet Models Transfer Better?
It is found that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy, and ImageNet features are less general than previously suggested. Expand
PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
This work presents a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network, and demonstrates substantial improvements over the state-of-the-art on the ILSVRC 2012 benchmark. Expand
Rethinking the Inception Architecture for Computer Vision
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Coordinating Filters for Faster Deep Neural Networks
Force Regularization, which uses attractive forces to enforce filters so as to coordinate more weight information into lower-rank space, is proposed and mathematically and empirically verified that after applying this technique, standard LRA methods can reconstruct filters using much lower basis and thus result in faster DNNs. Expand
Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
It is shown that the top face-verification results from the Labeled Faces in the Wild data set were obtained with networks containing hundreds of millions of parameters, using a mix of convolutional, locally connected, and fully connected layers. Expand