# DLow: Diversifying Latent Flows for Diverse Human Motion Prediction

@inproceedings{Yuan2020DLowDL, title={DLow: Diversifying Latent Flows for Diverse Human Motion Prediction}, author={Ye Yuan and Kris M. Kitani}, booktitle={ECCV}, year={2020} }

Deep generative models are often used for human motion prediction as they are able to model multi-modal data distributions and characterize diverse human behavior. While much care has been taken into designing and learning deep generative models, how to efficiently produce diverse samples from a deep generative model after it has been trained is still an under-explored problem. To obtain samples from a pretrained generative model, most existing generative human motion prediction methods draw a… Expand

#### Figures, Tables, and Topics from this paper

#### 22 Citations

Behavior-Driven Synthesis of Human Dynamics

- Computer Science
- CVPR
- 2021

This work proposes a conditional variational framework which explicitly disentangles posture from behavior and is able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence. Expand

Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks

- Computer Science
- 2021

Human motion prediction, which plays a key role in computer vision, generally requires a past motion sequence as input. However, in real applications, a complete and correct past motion sequence can… Expand

LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

- Computer Science
- ArXiv
- 2021

Lookout, an approach to jointly perceive the environment and predict a diverse set of futures from sensor data, estimate their probability, and optimize a contingency plan over these diverse future realizations, learns a diverse joint distribution over multi-agent future trajectories in a traffic scene that allows to cover a wide range of future modes with high sample efficiency. Expand

We are More than Our Joints: Predicting how 3D Bodies Move

- Computer Science
- CVPR
- 2021

MOJO is trained, which is a novel variational autoencoder that generates motions from latent frequencies that preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion. Expand

Contextually Plausible and Diverse 3D Human Motion Prediction.

- Computer Science
- 2020

A new variational framework that accounts for both diversity and context of the generated future motion is developed, and in contrast to existing approaches, condition the sampling of the latent variable that acts as source of diversity on the representation of the past observation, thus encouraging it to carry relevant information. Expand

AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting

- Computer Science
- ArXiv
- 2021

A stochastic multi-agent trajectory prediction model that can attend to features of any agent at any previous timestep when inferring an agent’s future position is proposed and significantly improves the state of the art on wellestablished pedestrian and autonomous driving datasets. Expand

Conditional Temporal Variational AutoEncoder for Action Video Prediction

- Computer Science
- ArXiv
- 2021

This paper proposes an Action Conditional Temporal Variational AutoEncoder (ACT-VAE) to improve motion prediction accuracy and capture movement diversity, surpassing state-of-the-art approaches. Expand

Action-Conditioned 3D Human Motion Synthesis with Transformer VAE

- Computer Science
- ArXiv
- 2021

This work designs a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametric SMPL human body models estimated from action recognition datasets and evaluates the approach on the NTU RGB+D, HumanAct12 and UESTC datasets and shows improvements over the state of the art. Expand

Deep Time Series Forecasting with Shape and Temporal Criteria

- Mathematics, Computer Science
- ArXiv
- 2021

This paper addresses the problem of multi-step time series forecasting for non-stationary signals that can present sudden changes by introducing STRIPE++ (Shape and Time diverRsIty in Probabilistic forEcasting), a framework for providing a set of sharp and diverse forecasts, where the structured shape and time diversity is enforced with a determinantal point process (DPP) diversity loss. Expand

Generating Smooth Pose Sequences for Diverse Human Motion Prediction

- Computer Science
- ArXiv
- 2021

Recent progress in stochastic motion prediction, i.e., predicting multiple possible future human motions given a single past pose sequence, has led to producing truly diverse future motions and even… Expand

#### References

SHOWING 1-10 OF 81 REFERENCES

Diverse Trajectory Forecasting with Determinantal Point Processes

- Computer Science
- ICLR
- 2020

This work proposes to learn a diversity sampling function (DSF) that generates a diverse and likely set of future trajectories and demonstrates the diversity of the trajectories produced by the approach on both low-dimensional 2D trajectory data and high-dimensional human motion data. Expand

A Stochastic Conditioning Scheme for Diverse Human Motion Prediction

- Computer Science
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020

This paper proposes to stochastically combine the root of variations with previous pose information, so as to force the model to take the noise into account, and exploits this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Expand

PacGAN: The Power of Two Samples in Generative Adversarial Networks

- Computer Science, Mathematics
- IEEE Journal on Selected Areas in Information Theory
- 2020

It is shown that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process, and numerical experiments suggests that packing provides significant improvements in practice as well. Expand

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders

- Computer Science, Mathematics
- ICLR
- 2019

This paper investigates posterior collapse from the perspective of training dynamics and proposes an extremely simple modification to VAE training to reduce inference lag: depending on the model's current mutual information between latent variable and observation, the inference network is optimized before performing each model update. Expand

Accurate and Diverse Sampling of Sequences Based on a "Best of Many" Sample Objective

- Computer Science, Mathematics
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018

This work addresses challenges in a Gaussian Latent Variable model for sequence prediction with a "Best of Many" sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data. Expand

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

- Computer Science, Mathematics
- ECCV
- 2018

This work presents a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation that jointly learns a feature embedding for motion modes and a feature transformation that represents the transition of one motion mode to the next motion mode. Expand

Semi-Amortized Variational Autoencoders

- Computer Science, Mathematics
- ICML
- 2018

This work proposes a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them, which enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Expand

Wasserstein Auto-Encoders

- Computer Science, Mathematics
- ICLR
- 2018

The Wasserstein Auto-Encoder (WAE) is proposed---a new algorithm for building a generative model of the data distribution that shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score. Expand

Improved Training of Wasserstein GANs

- Computer Science, Mathematics
- NIPS
- 2017

This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning. Expand

InfoVAE: Information Maximizing Variational Autoencoders

- Computer Science, Mathematics
- ArXiv
- 2017

It is shown that this model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution, and it is demonstrated that the models outperform competing approaches on multiple performance metrics. Expand