FIFA: Fast Inference Approximation for Action Segmentation

@inproceedings{Souri2021FIFAFI,
  title={FIFA: Fast Inference Approximation for Action Segmentation},
  author={Yaser Souri and Yazan Abu Farha and Fabien Despinoy and Gianpiero Francesca and Juergen Gall},
  booktitle={German Conference on Pattern Recognition},
  year={2021}
}
We introduce FIFA, a fast approximate inference method for action segmentation and alignment. Unlike previous approaches, FIFA does not rely on expensive dynamic programming for inference. Instead, it uses an approximate differentiable energy function that can be minimized using gradient-descent. FIFA is a general approach that can replace exact inference, improving its speed by more than 5 times while maintaining its performance. FIFA is an anytime inference algorithm that provides a better… 

Robust Action Segmentation from Timestamp Supervision

This work relaxes the restrictive assumption that every action instance is annotated with a timestamp, which is a restrictive assumption since it assumes that annotators do not miss any action, and takes missing annotations for some action instances into account.

Distill and Collect for Semi-Supervised Temporal Action Segmentation

This paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences and uses multi-stream distillation that repeatedly refines and finally combines their frame predictions.

Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

A two-stream framework is proposed, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos and presents a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences.

Transformers in Action: Weakly Supervised Action Segmentation

This work demonstrates through their architecture how they can be applied to improve action alignment accuracy over the equivalent RNN-based models with the attention mechanism focusing around salient action transition regions, and subsequently demonstrates how this approach can also improve the overall segmentation performance.

Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos

This paper presents a framework to segment streaming videos online at test time using Dynamic Programming and shows its advantages over greedy sliding window approach, and investigates three different multi-view inference techniques to generate more accurate frame-wise pseudo ground-truth with no additional annotation cost.

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup, and extends the framework to the timestamp supervised setting via the proposed constrained k-medoids algorithm to generate pseudo-segmentations.

Segmentation from Timestamp Supervision

The optimization, additional ablation studies, and further details of the optimization and additional ablator studies are provided.

Temporal Action Segmentation: An Analysis of Modern Technique

This survey examines the task definition, common benchmarks, types of supervision, and prevalent evaluation measures of TAS, and systematically investigates two essential techniques of this topic, i.e., frame representation, and temporal modeling.

References

SHOWING 1-10 OF 38 REFERENCES

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

This paper proposes a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network that achieves the accuracy of state-of-the-art approaches while being 14 times faster to train and 20 times faster during inference.

On Evaluating Weakly Supervised Action Segmentation Methods

This work focuses on two aspects of the use and evaluation of weakly supervised action segmentation approaches that are often overlooked: the performance variance over multiple training runs and the impact of selecting feature extractors for this task.

Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation

This work proposes a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier and introduces an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach.

Weakly Supervised Energy-Based Learning for Action Segmentation

A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.

Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment

  • Li DingChenliang Xu
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
A novel action modeling framework is proposed, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion.

Boundary-Aware Cascade Networks for Temporal Action Segmentation

A new boundary-aware cascade network is presented by introducing a new cascading paradigm, called Stage Cascade, to enable the model to have adaptive receptive fields and more confident predictions for ambiguous frames, and a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information.

Improving Action Segmentation via Graph-Based Temporal Reasoning

A network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans is proposed.

Temporal Convolutional Networks for Action Segmentation and Detection

A class of temporal models that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection, which are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks.

Temporal Deformable Residual Networks for Action Segmentation in Videos

  • Peng LeiS. Todorovic
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
A new model - temporal deformable residual network (TDRN) - aimed at analyzing video intervals at multiple temporal scales for labeling video frames demonstrates that TDRN outperforms the state of the art in frame-wise segmentation accuracy, segmental edit score, and segmental overlap F1 score.

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

The proposed Discriminative Differentiable Dynamic Time Warping (D3TW) innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks.