• Publications
  • Influence
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork. Expand
YOLOv3: An Incremental Improvement
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but moreExpand
YOLO9000: Better, Faster, Stronger
TLDR
YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work. Expand
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
TLDR
The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy. Expand
Real-time grasp detection using convolutional neural networks
TLDR
An accurate, real-time approach to robotic grasp detection based on convolutional neural networks that outperforms state-of-the-art approaches by 14 percentage points and runs at 13 frames per second on a GPU. Expand
YOLOv 3 : An Incremental Improvement
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but moreExpand
IQA: Visual Question Answering in Interactive Environments
TLDR
The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1. Expand
Who Let the Dogs Out? Modeling Dog Behavior from Visual Data
TLDR
This model takes visual information as input and directly predicts the actions of the agent, and the representation learned by the model encodes distinct information compared to representations trained on image classification, and this learned representation can generalize to other domains. Expand
Study of Entity Detection and Identification using Deep Learning Techniques a Survey
TLDR
Various frameworks like HyperNet, novel CAD YOLO Voxnet are studied, various methods for object detection and identification like region generation, scale invariant detection, non maximum weighted, Sparse matrix distribution, Background modeling, Speed Up Robust Feature (SURF), Single Shot Detection (SSD) are studied. Expand
YOLO-based Adaptive Window Two-stream Convolutional Neural Network for Video Classification
[1] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Largescale video classification with convolutional neural networks. In Proceedings of the IEEE conferenceExpand
...
1
2
...