Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles

  title={Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles},
  author={Dong Cao and Lisha Xu},
  booktitle={ACPR Workshops},
Pedestrian action recognition and intention prediction is one of the core issues in the field of autonomous driving. In this research field, action recognition is one of the key technologies. A large number of scholars have done a lot of works to improve the accuracy of the algorithm for the task. However, there are relatively few studies and improvements in the computational complexity of algorithms and system real-time. In the autonomous driving application scenario, the real-time performance… 
1 Citations
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
The UG+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021 is launched, with a goal of evaluating and advancing the robustness of AR models in dark environments and guides models to tackle such a task in both fully and semi-supervised manners.


MARS: Motion-Augmented RGB Stream for Action Recognition
This paper introduces two learning approaches to train a standard 3D CNN, operating on RGB frames, that mimics the motion stream, and as a result avoids flow computation at test time, and denotes the stream trained using this combined loss as Motion-Augmented RGB Stream (MARS).
3D Convolutional Neural Networks for Human Action Recognition
A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Two-Stream Convolutional Networks for Action Recognition in Videos
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Action Recognition with Improved Trajectories
  • Heng Wang, C. Schmid
  • Computer Science
    2013 IEEE International Conference on Computer Vision
  • 2013
Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Hidden Two-Stream Convolutional Networks for Action Recognition
This paper presents a novel CNN architecture that implicitly captures motion information between adjacent frames and directly predicts action classes without explicitly computing optical flow, and significantly outperforms the previous best real-time approaches.
Action recognition by dense trajectories
This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
Im2Flow: Motion Hallucination from Static Images for Action Recognition
This work devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map and shows the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition.
Motion Detection Based on Frame Difference Method
A new algorithm for detecting moving objects from a static background scene based on frame difference is presented and the absolute difference is calculated between the consecutive frames and the difference image is stored in the system.
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
The concept of end-to-end learning of optical flow is advanced and it work really well, and faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet are presented.
Hallucinating Optical Flow Features for Video Classification
The proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances and can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification.