• Publications
  • Influence
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork. Expand
Learning Everything about Anything: Webly-Supervised Visual Concept Learning
TLDR
A fully-automated approach for learning extensive models for a wide range of variations within any concept, which leverages vast resources of online books to discover the vocabulary of variance, and intertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the models. Expand
Asynchronous Temporal Fields for Action Recognition
TLDR
This work proposes a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. Expand
How Important Are "Deformable Parts" in the Deformable Parts Model?
TLDR
By increasing the number of components, and switching the initialization step from their aspect-ratio, left-right flipping heuristics to appearance-based clustering, considerable improvement in performance is obtained, and it is observed that with these new components, the part deformations can now be turned off, yet obtaining results that are almost on par with the original DPM detector. Expand
FigureSeer: Parsing Result-Figures in Research Papers
TLDR
This paper introduces FigureSeer, an end-to-end framework for parsing result-figures, that enables powerful search and retrieval of results in research papers and formulates a novel graph-based reasoning approach using a CNN-based similarity metric. Expand
An empirical study of context in object detection
TLDR
This paper presents an empirical evaluation of the role of context in a contemporary, challenging object detection task - the PASCAL VOC 2008, using top-performing local appearance detectors as baseline and evaluates several different sources of context and ways to utilize it. Expand
PDFFigures 2.0: Mining figures from research papers
TLDR
An algorithm that extracts figures, tables, and captions from documents called “PDFFigures 2.0” that analyzes the structure of individual pages by detecting captions, graphical elements, and chunks of body text, and then locates figures and tables by reasoning about the empty regions within that text. Expand
Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph
TLDR
A novel gated energy function parametrization that learns adaptive relations conditioned on visual observations is introduced that exploits the statistical dependency between relational entities spatially and temporally. Expand
Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers
TLDR
This work introduces a new dataset of 150 computer science papers along with ground truth labels for the locations of the figures, tables and captions within them and demonstrates a caption-to-figure matching component that is effective even in cases where individual captions are adjacent to multiple figures. Expand
Exemplar Driven Character Recognition in the Wild
TLDR
The essence of the exemplar approach is that rather than seeking to establish commonality within classes, a separate classifier is learnt for each exemplar in the dataset, which is equivalent to optimizing the convex objective. Expand
...
1
2
3
...