• Publications
  • Influence
YOLO9000: Better, Faster, Stronger
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, bothExpand
  • 4,362
  • 802
  • Open Access
YOLOv3: An Incremental Improvement
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but moreExpand
  • 3,194
  • 727
  • Open Access
Bidirectional Attention Flow for Machine Comprehension
Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have beenExpand
  • 1,137
  • 308
  • Open Access
Unsupervised Deep Embedding for Clustering Analysis
Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learningExpand
  • 728
  • 205
  • Open Access
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary valuesExpand
  • 1,354
  • 204
  • Open Access
Describing objects by their attributes
We propose to shift the goal of recognition from naming to describing. Doing so allows us not only to name familiar objects, but also: to report unusual aspects of a familiar object (“spotty dog”,Expand
  • 1,581
  • 202
  • Open Access
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need toExpand
  • 377
  • 89
  • Open Access
Recognition using visual phrases
In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their componentExpand
  • 386
  • 53
  • Open Access
Every Picture Tells a Story: Generating Sentences from Images
Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linkingExpand
  • 826
  • 48
  • Open Access
Understanding egocentric activities
We present a method to analyze daily activities, such as meal preparation, using video from an egocentric camera. Our method performs inference about activities, actions, hands, and objects. DailyExpand
  • 328
  • 34
  • Open Access