Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition

@article{Bagautdinov2017SocialSU,
  title={Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition},
  author={Timur M. Bagautdinov and Alexandre Alahi and François Fleuret and Pascal Fua and Silvio Savarese},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={3425-3434}
}
We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 29 CITATIONS

Learning Actor Relation Graphs for Group Activity Recognition

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Training Algorithms for Multiple Object Tracking

VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS

References

Publications referenced by this paper.
SHOWING 1-10 OF 41 REFERENCES

A Hierarchical Deep Temporal Model for Group Activity Recognition

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

End-to-End People Detection in Crowded Scenes

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2015
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Learning realistic human actions from movies

  • 2008 IEEE Conference on Computer Vision and Pattern Recognition
  • 2008
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Convolutional Two-Stream Network Fusion for Video Action Recognition

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 2 EXCERPTS

Similar Papers

Loading similar papers…