• Corpus ID: 25188047

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

@article{Cherian2017SequenceSU,
  title={Sequence Summarization Using Order-constrained Kernelized Feature Subspaces},
  author={Anoop Cherian and Suvrit Sra and Richard I. Hartley},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.08583}
}
Representations that can compactly and effectively capture temporal evolution of semantic content are important to machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where action dynamics are characterized by their variations in time. As these features are often non-linear, we propose a novel pooling… 

Figures and Tables from this paper

Fine-Grained Action Recognition by Motion Saliency and Mid-Level Patches

A fine-grained action recognition model using a graph structure to describe relationships between the mid-level patches is built, which calculates the appearance and motion features of mid- level patches and the binary motion cooperation relationships between adjacent patches in the graph.

References

SHOWING 1-10 OF 42 REFERENCES

Kernel analysis on Grassmann manifolds for action recognition

Ordered Pooling of Optical Flow Sequences for Action Recognition

This paper introduces a novel ordered representation of consecutive optical flow frames as an alternative and argues that this representation captures the action dynamics more efficiently than RGB frames, and provides intuitions on why such a representation is better for action recognition.

Learning End-to-end Video Classification with Rank-Pooling

A new model for representation learning and classification of video sequences based on a convolutional neural network coupled with a novel temporal pooling layer that can make use of any existing convolutionAL neural network architecture without modification or introduction of additional parameters is introduced.

Action recognition with trajectory-pooled deep-convolutional descriptors

This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features and deep-learned features, and achieves superior performance to the state of the art on these datasets.

Expanding the Family of Grassmannian Kernels: An Embedding Perspective

This work introduces several positive definite Grassmannian kernels, including universal ones, and demonstrates their superiority over previously-known kernels in various tasks, such as classification, clustering, sparse coding and hashing.

Human action recognition based on graph-embedded spatio-temporal subspace

Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons

This paper presents two different kernels for action recognition, namely a sequence compatibility kernel that captures the spatio-temporal compatibility of joints in one sequence against those in the other, and a dynamics compatibility kernels that explicitly models the action dynamics of a sequence.

Long-term recurrent convolutional networks for visual recognition and description

A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

Dynamic Image Networks for Action Recognition

The new approximate rank pooling CNN layer allows the use of existing CNN models directly on video data with fine-tuning to generalize dynamic images to dynamic feature maps and the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance.

Two-Stream Convolutional Networks for Action Recognition in Videos

This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.