• Publications
  • Influence
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
We address the problem of activity detection in continuous, untrimmed video streams. This is a difficult task that requires extracting meaningful spatio-temporal features to capture activities,Expand
  • 291
  • 39
  • PDF
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deepExpand
  • 495
  • 22
  • PDF
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. To capture the inherentExpand
  • 53
  • 13
  • PDF
A Multi-scale Multiple Instance Video Description Network
Generating natural language descriptions for in-the-wild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN)Expand
  • 47
  • 7
  • PDF
A New Meta-Baseline for Few-Shot Learning
Meta-learning has become a popular framework for few-shot learning in recent years, with the goal of learning a model from collections of few-shot classification tasks. While more and more novelExpand
  • 14
  • 6
  • PDF
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning forExpand
  • 110
  • 3
  • PDF
Poison Identification Based on Bayesian Network: A Novel Improvement on K2 Algorithm via Markov Blanket
The purpose of this paper was to provide help for poison identification via the Bayesian network according to the observed preliminary symptoms of the poisoning people. We proposed a novelExpand
  • 6
  • 3
Coupled Morphological–Hemodynamic Computational Analysis of Type B Aortic Dissection: A Longitudinal Study
Progressive false lumen aneurysmal degeneration in type B aortic dissection (TBAD) is a complex process with a multi-factorial etiology. Patient-specific computational fluid dynamics (CFD)Expand
  • 16
  • 2
Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning
We propose a novel method capable of retrieving clips from untrimmed videos based on natural language queries. This cross-modal retrieval task plays a key role in visual-semantic understanding, andExpand
  • 11
  • 2
  • PDF
Joint Event Detection and Description in Continuous Video Streams
Dense video captioning involves first localizing events in a video and then generating captions for the identified events. We present the Joint Event Detection and Description Network (JEDDi-Net) forExpand
  • 19
  • 1
  • PDF