Learning Sequential Latent Variable Models from Multimodal Time Series Data

  title={Learning Sequential Latent Variable Models from Multimodal Time Series Data},
  author={Oliver Limoyo and Trevor Ablett and Jonathan Kelly},
. Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is… 

Figures and Tables from this paper



Heteroscedastic Uncertainty for Robust Generative Latent Dynamics

This letter presents a method to jointly learn a latent state representation, and the associated dynamics that is amenable for long-term planning, and closed-loop control under perceptually difficult conditions, and demonstrates that it produces significantly more accurate predictions, and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Offline Reinforcement Learning from Images with Latent Space Models

This work proposes to learn a latent-state dynamics model, and represent the uncertainty in the latent space of the model predictions, and significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods.

Multimodal Generative Models for Scalable Weakly-Supervised Learning

A multimodal variational autoencoder that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem and shares parameters to efficiently learn under any combination of missing modalities, thereby enabling weakly-supervised learning.

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

The Kalman variational auto-encoder is introduced, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics.

From Pixels to Torques: Policy Learning with Deep Dynamical Models

This paper introduces a data-efficient, model-based reinforcement learning algorithm that learns a closed-loop control policy from pixel information only, and facilitates fully autonomous learning from pixels to torques.

Structured Inference Networks for Nonlinear State Space Models

A unified algorithm is introduced to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks.

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.

Joint Multimodal Learning with Deep Generative Models

The proposed joint multimodal variational autoencoder (JMVAE), in which all modalities are independently conditioned on joint representation, models a joint distribution of modalities and can generate and reconstruct them more properly than conventional VAEs.

Robot Motion Planning in Learned Latent Spaces

L-SBMP is presented, a methodology toward computing motion plans for complex robotic systems by learning a plannable latent representation through an autoencoding network, a dynamics network, and a collision checking network, which mirror the three main algorithmic primitives of SBMP.