Learning to Segment Actions from Observation and Narration

  title={Learning to Segment Actions from Observation and Narration},
  author={D. Fried and Jean-Baptiste Alayrac and P. Blunsom and Chris Dyer and S. Clark and A. Nematzadeh},
  • D. Fried, Jean-Baptiste Alayrac, +3 authors A. Nematzadeh
  • Published in ACL 2020
  • Computer Science
  • We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision used in training, and we find that both task structure and narrative language provide large benefits… CONTINUE READING
    2 Citations

    Figures and Tables from this paper.

    A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
    A Visuospatial Dataset for Naturalistic Verb Learning


    Unsupervised Learning from Narrated Instruction Videos
    • 132
    • PDF
    Cross-Task Weakly Supervised Learning From Instructional Videos
    • 26
    • PDF
    Unsupervised Learning of Action Classes With Continuous Temporal Embedding
    • 8
    • PDF
    Weakly supervised learning of actions from transcripts
    • 49
    • PDF
    Connectionist Temporal Modeling for Weakly Supervised Action Labeling
    • 129
    • PDF
    Towards Automatic Learning of Procedures From Web Instructional Videos
    • 115
    • PDF
    Temporal Action Detection with Structured Segment Networks
    • 228
    • PDF
    Weakly Supervised Action Learning with RNN Based Fine-to-Coarse Modeling
    • 83
    • PDF
    Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints
    • 28
    • Highly Influential
    • PDF
    Unsupervised Learning and Segmentation of Complex Activities from Video
    • F. Sener, A. Yao
    • Computer Science
    • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
    • 2018
    • 32
    • PDF