You Only Look Once: Unified, Real-Time Object Detection

  title={You Only Look Once: Unified, Real-Time Object Detection},
  author={Joseph Redmon and Santosh Kumar Divvala and Ross B. Girshick and Ali Farhadi},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance… 

Figures and Tables from this paper

Comparison Network for One-Shot Conditional Object Detection

A novel one-shot conditional object detection framework, referred as Comparison Network (ComparisonNet), has been proposed, which can detect objects of both seen and unseen classes without further training and has the advantages including class-agnostic, training-free for unseen classes, and without catastrophic forgetting.

Zero Shot Detection

This work proposes a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes and observes significant improvements on the average precision of unseen classes.

A Multi-Space Approach to Zero-Shot Object Detection

A novel multi-space approach to solve Zero-Shot Object Detection where predictions obtained in two different search spaces are combined and the problem of hubness is discussed and it is shown that the approach alleviates hubness with a performance superior to previously proposed methods.

Single-Shot Object Detection for Face Masks using YOLOv3

This paper aims to demonstrate object detection using Y OLOv3 (A variant of the original YOLO architecture), one of the fastest real-time object detection algorithms (45 frames per second) as compared to the R-CNN family (RCNN, Fast R-Congress, Faster R- CNN, etc.).

Dual Refinement Network for Single-Shot Object Detection

A dual refinement network (DRN) is proposed to boost the performance of the single-stage detector and a multi-deformable head is designed, in which multiple detection paths with different receptive field sizes devote themselves to detecting objects.

Towards the Success Rate of One: Real-Time Unconstrained Salient Object Detection

This work proposes an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks, which performs saliency map prediction without pixel-level annotations, salient object Detection without object proposals, and salient object subitizing simultaneously, all in a single pass within a unified framework.

Learning to Filter Object Detections

A filtering network (FNet) is proposed, a method which replaces NMS with a differentiable neural network that allows joint reasoning and re-scoring of the generated set of hypotheses per image, and demonstrates that FNet, a feed-forward network architecture, is able to mimic NMS decisions, despite the sequential nature of NMS.

Cross-Supervised Object Detection

This work proposes a unified framework that combines a detection head trained from instance- level annotations and a recognition head learned from image-level annotations, together with a spatial correlation module that bridges the gap between detection and recognition.

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

The evolutionary deep intelligence framework is leveraged to evolve the YOLOv2 network architecture and produce an optimized architecture that has 2.8X fewer parameters with just a ~2% IOU drop, and a motion-adaptive inference method is introduced into the proposed Fast Y OLO framework to reduce the frequency of deep inference with O-YOLO v2 based on temporal motion characteristics.

Real-time object detection by a multi-feature fully convolutional network

A new model free from region proposals for object detection is proposed which treats detection task as a regression problem and can predict bounding boxes and class probabilities simultaneously from a full input image.



Simultaneous Detection and Segmentation

This work builds on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN), introducing a novel architecture tailored for SDS, and uses category-specific, top-down figure-ground predictions to refine the bottom-up proposals.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Scalable Object Detection Using Deep Neural Networks

This work proposes a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Learning to Localize Objects with Structured Output Regression

This work proposes to treat object localization in a principled way by posing it as a problem of predicting structured data: it model the problem not as binary classification, but as the prediction of the bounding box of objects located in images.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of

Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in

Diagnosing Error in Object Detectors

This paper shows how to analyze the influences of object characteristics on detection performance and the frequency and impact of different types of false positives, and shows that sensitivity to size, localization error, and confusion with similar objects are the most impactful forms of error.

Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model

An object detection system that relies on a multi-region deep convolutional neural network that also encodes semantic segmentation-aware features that aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization.