FIFA: Fast Inference Approximation for Action Segmentation
@inproceedings{Souri2021FIFAFI, title={FIFA: Fast Inference Approximation for Action Segmentation}, author={Yaser Souri and Yazan Abu Farha and Fabien Despinoy and Gianpiero Francesca and Juergen Gall}, booktitle={German Conference on Pattern Recognition}, year={2021} }
We introduce FIFA, a fast approximate inference method for action segmentation and alignment. Unlike previous approaches, FIFA does not rely on expensive dynamic programming for inference. Instead, it uses an approximate differentiable energy function that can be minimized using gradient-descent. FIFA is a general approach that can replace exact inference, improving its speed by more than 5 times while maintaining its performance. FIFA is an anytime inference algorithm that provides a better…
8 Citations
Robust Action Segmentation from Timestamp Supervision
- Computer ScienceBMVC
- 2022
This work relaxes the restrictive assumption that every action instance is annotated with a timestamp, which is a restrictive assumption since it assumes that annotators do not miss any action, and takes missing annotations for some action instances into account.
Distill and Collect for Semi-Supervised Temporal Action Segmentation
- Computer ScienceArXiv
- 2022
This paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences and uses multi-stream distillation that repeatedly refines and finally combines their frame predictions.
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
- Computer Science2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2022
A two-stream framework is proposed, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos and presents a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences.
Transformers in Action: Weakly Supervised Action Segmentation
- Computer Science
- 2022
This work demonstrates through their architecture how they can be applied to improve action alignment accuracy over the equivalent RNN-based models with the attention mechanism focusing around salient action transition regions, and subsequently demonstrates how this approach can also improve the overall segmentation performance.
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This paper presents a framework to segment streaming videos online at test time using Dynamic Programming and shows its advantages over greedy sliding window approach, and investigates three different multi-view inference techniques to generate more accurate frame-wise pseudo ground-truth with no additional annotation cost.
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
- Computer ScienceECCV
- 2022
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup, and extends the framework to the timestamp supervised setting via the proposed constrained k-medoids algorithm to generate pseudo-segmentations.
Segmentation from Timestamp Supervision
- Computer Science
- 2022
The optimization, additional ablation studies, and further details of the optimization and additional ablator studies are provided.
Temporal Action Segmentation: An Analysis of Modern Technique
- Computer ScienceArXiv
- 2022
This survey examines the task definition, common benchmarks, types of supervision, and prevalent evaluation measures of TAS, and systematically investigates two essential techniques of this topic, i.e., frame representation, and temporal modeling.
References
SHOWING 1-10 OF 38 REFERENCES
Fast Weakly Supervised Action Segmentation Using Mutual Consistency
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022
This paper proposes a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network that achieves the accuracy of state-of-the-art approaches while being 14 times faster to train and 20 times faster during inference.
On Evaluating Weakly Supervised Action Segmentation Methods
- Computer ScienceArXiv
- 2020
This work focuses on two aspects of the use and evaluation of weakly supervised action segmentation approaches that are often overlooked: the performance variance over multiple training runs and the impact of selecting feature extractors for this task.
Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation
- Computer Science, Environmental ScienceECCV
- 2016
This work proposes a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier and introduces an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach.
Weakly Supervised Energy-Based Learning for Action Segmentation
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A novel action modeling framework is proposed, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion.
Boundary-Aware Cascade Networks for Temporal Action Segmentation
- Computer ScienceECCV
- 2020
A new boundary-aware cascade network is presented by introducing a new cascading paradigm, called Stage Cascade, to enable the model to have adaptive receptive fields and more confident predictions for ambiguous frames, and a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information.
Improving Action Segmentation via Graph-Based Temporal Reasoning
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans is proposed.
Temporal Convolutional Networks for Action Segmentation and Detection
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A class of temporal models that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection, which are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks.
Temporal Deformable Residual Networks for Action Segmentation in Videos
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A new model - temporal deformable residual network (TDRN) - aimed at analyzing video intervals at multiple temporal scales for labeling video frames demonstrates that TDRN outperforms the state of the art in frame-wise segmentation accuracy, segmental edit score, and segmental overlap F1 score.
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
The proposed Discriminative Differentiable Dynamic Time Warping (D3TW) innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks.