Learning Discriminative Prototypes with Dynamic Time Warping

  title={Learning Discriminative Prototypes with Dynamic Time Warping},
  author={Xiaobin Chang and Frederick Tung and Greg Mori},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Dynamic Time Warping (DTW) is widely used for temporal data processing. However, existing methods can neither learn the discriminative prototypes of different classes nor exploit such prototypes for further analysis. We propose Discriminative Prototype DTW (DP-DTW), a novel method to learn class-specific discriminative prototypes for temporal recognition tasks. DP-DTW shows superior performance compared to conventional DTWs on time series classification benchmarks1. Combined with end-to-end… 

Figures and Tables from this paper

Video-Text Representation Learning via Differentiable Weak Temporal Alignment
This paper proposes a novel multi-modal self-supervised framework, VT-TWINS, to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW), and applies a contrastive learning scheme to learn feature representations on weakly correlation data.
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
It is argued that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances and alleviated task gap between classification and localization.
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
This work introduces Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching, a robust similarity measure for sequence retrieval and its effectiveness as a training loss on diverse applications.
Transformers in Action: Weakly Supervised Action Segmentation
This work demonstrates through their architecture how they can be applied to improve action alignment accuracy over the equivalent RNN-based models with the attention mechanism focusing around salient action transition regions, and subsequently demonstrates how this approach can also improve the overall segmentation performance.
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
A two-stream framework is proposed, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos and presents a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences.
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
A framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach and improves the framework by introducing the Online-Offline Discrepancy Loss (OODL) to encourage the segmentation results to have a higher temporal consistency.
Temporal Alignment Networks for Long-term Video
A temporal alignment network that ingests long term video sequences, and associated text sentences, in order to determine if a sentence is alignable with the video, and if it is alignedable, then determine its alignment is proposed.
Robust Time Series Dissimilarity Measure for Outlier Detection and Periodicity Detection
A novel time series dissimilarity measure named RobustDTW, which estimates the trend and optimizes the time warp in an alternating manner by utilizing the designed temporal graph trend filtering, and extends it to periodicity detection and outlier time series detection.


D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
The proposed Discriminative Differentiable Dynamic Time Warping (D3TW) innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks.
DTWNet: a Dynamic Time Warping Network
For the first time, theDTW loss is theoretically analyzed, and a stochastic backpropogation scheme is proposed to improve the accuracy and efficiency of the DTW learning.
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
  • Li Ding, Chenliang Xu
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
A novel action modeling framework is proposed, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion.
Temporal Transformer Networks: Joint Learning of Invariant and Discriminative Time Warping
This paper proposes a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation through an interpretable differentiable module.
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
This work proposes a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data and shows that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks.
Weakly Supervised Energy-Based Learning for Action Segmentation
A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.
Weakly Supervised Action Learning with RNN Based Fine-to-Coarse Modeling
A combination of a discriminative representation of subactions, modeled by a recurrent neural network, and a coarse probabilistic model to allow for a temporal alignment and inference over long sequences of human actions is proposed.
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
The Extended Connectionist Temporal Classification (ECTC) framework is introduced to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities.
Soft-DTW: a Differentiable Loss Function for Time-Series
This work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs, and shows that this regularization is particularly well suited to average and cluster time series under the DTW geometry.
Learning time-series shapelets
A new mathematical formalization of the task via a classification objective function is proposed and a tailored stochastic gradient learning algorithm is applied and can learn true top-K shapelets by capturing their interaction.