• Corpus ID: 67855445

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

@article{Hu2019ProgressRR,
  title={Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos},
  author={Bo Hu and Jianfei Cai and T. Cham and Junsong Yuan},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.00304}
}
Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions. However, the temporal relation of an action has not been fully explored. In this paper, we propose an end-to-end Progress Regression Recurrent Neural Network (PR-RNN) for online spatial-temporal action localization, which learns to infer the action by temporal progress regression. Two new action attributes, called progression and progress rate… 

References

SHOWING 1-10 OF 54 REFERENCES
Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction
TLDR
This work presents a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation and classification that is not only capable of performing S/T detection in real time, but can also perform early action prediction in an online fashion.
Action Tubelet Detector for Spatio-Temporal Action Localization
TLDR
The proposed ACtion Tubelet detector (ACT-detector) takes as input a sequence of frames and outputs tubelets, i.e., sequences of bounding boxes with associated scores, based on anchor cuboids that outperforms the state-of-the-art methods for frame-mAP and video-m AP on the J-HMDB and UCF-101 datasets, in particular at high overlap thresholds.
Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos
TLDR
A huge leap forward in action detection performance is achieved and 20% and 11% gain in mAP are reported on UCF-101 and J-HMDB-21 datasets respectively when compared to the state-of-the-art.
Temporal Action Localization by Structured Maximal Sums
We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum
Learning to Track for Spatio-Temporal Action Localization
TLDR
The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features, then tracks high-scoring proposals throughout the video using a tracking-by-detection approach that outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP.
Recurrent Tubelet Proposal and Recognition Networks for Action Detection
TLDR
This work presents a novel deep architecture called Recurrent Tubelet Proposal and Recognition (RTPR) networks to incorporate temporal context for action detection and conducts extensive experiments to demonstrate superior results over state-of-the-art methods.
An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos
TLDR
The proposed architecture is a unified deep network that is able to recognize and localize action based on 3D convolution features that can be readily applied to a general problem of video object segmentation.
Multi-region Two-Stream R-CNN for Action Detection
TLDR
A multi-region two-stream R-CNN model for action detection in realistic videos that links frame-level detections with the Viterbi algorithm, and temporally localize an action with the maximum subarray method is proposed.
Fast action proposals for human action detection and search
  • Gang Yu, Junsong Yuan
  • Mathematics, Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of the action proposals as well as competitive results on action detection and search.
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.
...
1
2
3
4
5
...