Structure-Aware Human-Action Generation

@article{Yu2020StructureAwareHG,
  title={Structure-Aware Human-Action Generation},
  author={Ping Yu and Yang Zhao and Chunyuan Li and Junsong Yuan and Changyou Chen},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.01971}
}
Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence. Most existing methods borrow ideas from video generation, which naively treat skeleton nodes/joints as pixels of images without considering the rich inter-frame and intra-frame structure information, leading to potential distorted actions. Graph convolutional networks (GCNs) is a promising way to leverage structure information to learn… 
GlocalNet: Class-aware Long-term Human Motion Synthesis
TLDR
A two-stage activity generation method to achieve long-term human motion synthesis by learning to synthesize a sparse motion trajectory and demonstrating the superiority of the proposed method over SOTA methods using various quantitative evaluation metrics on publicly available datasets.
Generative Adversarial Graph Convolutional Networks for Human Action Synthesis
TLDR
Kinetic-GAN notably surpasses the state-of-the-art methods in terms of distribution quality metrics while having the ability to synthesise more than one order of magnitude regarding the number of different actions.
MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion
TLDR
This work introduces MUGL, a novel deep neural model for large-scale, diverse generation of single and multi- person pose-based action sequences with locomotion, and incorporates duration-aware feature representations to enable variable-length sequence generation.
A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder
We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal
DanceIt: Music-Inspired Dancing Video Synthesis
TLDR
The proposed approach can perform the function of generating promising dancing videos by inputting music and develops a temporal alignment algorithm to align the rhythm of music and dance.
GAN-based Reactive Motion Synthesis with Class-aware Discriminators for Human-human Interaction
TLDR
This paper proposes a semi-supervised GAN system that synthesizes the reactive motion of a character given the active motion from another character and introduces a discriminator that not only tells if the generated movement is realistic or not, but also tells the class label of the interaction.
Human Action Recognition from Various Data Modalities: A Review
TLDR
This paper reviews both the hand-crafted feature-based and deep learning-based methods for single data modalities and also the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches for HAR.

References

SHOWING 1-10 OF 54 REFERENCES
Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions
TLDR
This paper focuses on skeleton-based action generation and proposes to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality, and is learned with a bi-directional generative-adversarial-net framework.
Interpretable 3D Human Action Analysis with Temporal Convolutional Networks
  • Tae Soo Kim, A. Reiter
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2017
TLDR
This work proposes to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition, and aims to take a step towards a spatio-temporal model that is easier to understand, explain and interpret.
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
TLDR
A novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.
Human Action Generation with Generative Adversarial Networks
TLDR
A framework of an autoencoder and a generative adversarial network to produce multiple and consecutive human actions conditioned on the initial state and the given class label is proposed.
Adversarial Geometry-Aware Human Motion Prediction
TLDR
This work proposes a novel frame-wise geodesic loss as a geometrically meaningful, more precise distance measurement and presents a new learning procedure to simultaneously validate the sequence-level plausibility of the prediction and its coherence with the input sequence by introducing two global recurrent discriminators.
Deep Video Generation, Prediction and Completion of Human Action Sequences
TLDR
This paper proposes a general, two-stage deep framework to generate human action videos with no constraints or arbitrary number of constraints, which uniformly address the three problems: video generation given no input frames, video prediction given the first few frames, and video completionGiven the first and last frames.
A New Representation of Skeleton Sequences for 3D Action Recognition
TLDR
Deep convolutional neural networks are proposed to be used to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and a Multi-Task Learning Network (MTLN) is proposed to jointly process all Frames of the clips in parallel to incorporate spatial structural information for action recognition.
Hierarchical recurrent neural network for skeleton based action recognition
  • Yong Du, Wei Wang, Liang Wang
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This paper proposes an end-to-end hierarchical RNN for skeleton based action recognition, and demonstrates that the model achieves the state-of-the-art performance with high computational efficiency.
Hierarchical Long-term Video Prediction without Supervision
TLDR
This work develops a novel training method that jointly trains the encoder, the predictor, and the decoder together without highlevel supervision and improves upon this by using an adversarial loss in the feature space to train the predictor.
Skeleton-Aided Articulated Motion Generation
This work makes the first attempt to generate articulated human motion sequence from a single image. On one hand, we utilize paired inputs including human skeleton information as motion embedding and
...
1
2
3
4
5
...