• Publications
  • Influence
You Only Look Once: Unified, Real-Time Object Detection
We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem toExpand
  • 9,374
  • 1444
  • PDF
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
We proposed a deep reinforcement learning (DRL) framework for target-driven visual navigation. Expand
  • 691
  • 69
  • PDF
YOLOv 3 : An Incremental Improvement
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but moreExpand
  • 187
  • 16
What’s Hidden in a Randomly Weighted Neural Network?
TLDR
We empirically show that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Expand
  • 40
  • 8
  • PDF
Visual Semantic Planning Using Deep Successor Representations
TLDR
A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. Expand
  • 83
  • 6
  • PDF
Actor and Observer: Joint Modeling of First and Third-Person Videos
TLDR
We introduce Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. Expand
  • 25
  • 6
  • PDF
Describing objects by their attributes
TLDR
We propose to shift the goal of recognition from naming to describing. Expand
  • 122
  • 5
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
TLDR
Fine-tuning BERT multiple times while varying only random seeds leads to substantial improvements over previously published validation results with the same model and experimental setup, on four tasks from the GLUE benchmark. Expand
  • 56
  • 5
  • PDF
Watching the World Go By: Representation Learning from Unlabeled Videos
TLDR
We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations, across a variety of temporal and non-temporal tasks. Expand
  • 12
  • 3
  • PDF
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
TLDR
We introduce RoboTHOR to democratize research in interactive and embodied visual AI. Expand
  • 6
  • 2
  • PDF