• Corpus ID: 216080778

YOLOv4: Optimal Speed and Accuracy of Object Detection

  title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
  author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We… 

ViT-YOLO:Transformer-Based YOLO for Object Detection

An improved backbone MHSA-Darknet is designed to retain sufficient global context information and extract more differentiated features for object detection via multi-head self-attention and present a simple yet highly effective weighted bi-directional feature pyramid network (BiFPN) for effectively cross-scale feature fusion.

Generating robust real-time object detector with uncertainty via virtual adversarial training

A new method for predicting uncertainty is proposed, which can quantify the reliability of the neural networks’ prediction, to validate the correctness of detecting results with low computational complexity and the experimental results demonstrate the effectiveness of the proposed approach.

Data augmentation for thermal infrared object detection with cascade pyramid generative adversarial network

The data augmentation algorithm proposed in this paper is called cascade pyramid generative adversarial network (CPGAN), and with this CPGAN, the detection accuracy of classical detection algorithms is greatly improved.

Six-channel Image Representation for Cross-domain Object Detection

This study proposes to concatenate the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset, hoping to address the domain shift problem while exploiting the success of available detection models.

SFPN: Synthetic FPN for Object Detection

A new SFPN (Synthetic Fusion Pyramid Network) arichtecture is proposed which creates various synthetic layers between layers of the original FPN to enhance the accuracy of light-weight CNN backones to extract objects’ visual features more accurately.

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Two large-scale Small Object Detection dAtasets (SODA), SODA-D and S ODA-A, which focus on the Driving and Aerial scenarios respectively are constructed, and the performance of mainstream methods on SOD a is evaluated.

LPNet: Retina Inspired Neural Network for Object Detection and Recognition

LPNet improves the detection accuracy by combining retina-like log-polar transformation and stable and sliding modes, and is a plug-and-play architecture for object detection and recognition.

CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters

  • Paul GavrikovJ. Keuper
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
It is concluded that model pre-training can succeed on arbitrary datasets if they meet size and variance conditions and that many pre-trained models contain degenerated filters which make them less robust and less suitable for fine-tuning on target applications.

A comparison of deep saliency map generators on multispectral data in object detection

This work tries to close the gaps by investigating three saliency map generator methods on how their maps differ in the different spectra, and examines how they perform when used for object detection.

Trapped in texture bias? A large scale comparison of deep instance segmentation

YOLACT++, SOTR and SOLOv2 are significantly more robust to out-of-distribution texture than other frameworks and it is shown that deeper and dynamic architectures improve robustness whereas training schedules, data augmentation and pre-training have only a minor impact.



Improved Regularization of Convolutional Neural Networks with Cutout

This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks.

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.

Receptive Field Block Net for Accurate and Fast Object Detection

Inspired by the structure of Receptive Fields (RFs) in human visual systems, a novel RF Block (RFB) module is proposed, which takes the relationship between the size and eccentricity of RFs into account, to enhance the feature discriminability and robustness.

Aggregated Residual Transformations for Deep Neural Networks

On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.

DetNet: Design Backbone for Object Detection

DetNet is proposed, which is a novel backbone network specifically designed for object detection that includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers.

Scale-Aware Trident Networks for Object Detection

A novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power is proposed and a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields is constructed.

FCOS: Fully Convolutional One-Stage Object Detection

For the first time, a much simpler and flexible detection framework achieving improved detection accuracy is demonstrated, and it is hoped that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.

R-FCN: Object Detection via Region-based Fully Convolutional Networks

This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Libra R-CNN: Towards Balanced Learning for Object Detection

Libra R-CNN is proposed, a simple but effective framework towards balanced learning for object detection that integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level.