Corpus ID: 215737113

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

@article{Kataoka2020WouldMD,
  title={Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?},
  author={Hirokatsu Kataoka and Tenga Wakamiya and K. Hara and Y. Satoh},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.04968}
}
How can we collect and use a video dataset to further improve spatiotemporal 3D Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question in video recognition, we have conducted an exploration study using a couple of large-scale video datasets and 3D CNNs. In the early era of deep neural networks, 2D CNNs have been better than 3D CNNs in the context of video recognition. Recent studies revealed that 3D CNNs can outperform 2D CNNs trained on a large-scale video… Expand
10 Citations
Metric-Based Attention Feature Learning for Video Action Recognition
  • Highly Influenced
  • PDF
ADCI-Net: an adaptive discriminative clip identification strategy for fast video action recognition
Skeleton Aware Multi-modal Sign Language Recognition
  • Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
  • Computer Science
  • ArXiv
  • 2021
  • 1
  • Highly Influenced
  • PDF
Skeleton Based Isolated Sign Language Recognition Using Whole-body Keypoints in a Universal Multi-modal Framework Fact Sheet
  • Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
  • 2021
Skeleton Based Sign Language Recognition Using Whole-body Keypoints
  • Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
  • 2021
Skimming and Scanning for Untrimmed Video Action Recognition
  • Yunyan Hong, Ailing Zeng, Min Li, Cewu Lu, Li Jiang, Qiang Xu
  • Computer Science
  • ArXiv
  • 2021
  • PDF
TCLR: Temporal Contrastive Learning for Video Representation
  • 1
  • PDF
Right on Time: Multi-Temporal Convolutions for Human Action Recognition in Videos
  • PDF

References

SHOWING 1-10 OF 35 REFERENCES
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
  • 695
  • PDF
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
  • 700
  • PDF
Large-Scale Video Classification with Convolutional Neural Networks
  • 4,523
  • Highly Influential
  • PDF
Learning Spatiotemporal Features with 3D Convolutional Networks
  • 4,164
  • PDF
Two-Stream Convolutional Networks for Action Recognition in Videos
  • 4,597
  • Highly Influential
  • PDF
Convolutional Two-Stream Network Fusion for Video Action Recognition
  • 1,612
  • PDF
3D Convolutional Neural Networks for Human Action Recognition
  • 3,602
  • PDF
A Closer Look at Spatiotemporal Convolutions for Action Recognition
  • 824
  • PDF
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
  • 2,497
  • PDF
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
  • 1,814
  • PDF
...
1
2
3
4
...