Representing Spatial Trajectories as Distributions

  title={Representing Spatial Trajectories as Distributions},
  author={D'idac Sur'is and Carl Vondrick},
We introduce a representation learning framework for spatial trajectories. We represent partial observations of trajectories as probability distributions in a learned latent space, which characterize the uncertainty about unobserved parts of the trajectory. Our framework allows us to obtain samples from a trajectory for any continuous point in time—both interpolating and extrapolating. Our flexible approach supports directly modifying specific attributes of a trajectory, such as its pace, as well… 



OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

A Recurrent Latent Variable Model for Sequential Data

It is argued that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech.

FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

FineGym is a new dataset built on top of gymnasium videos that provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy and systematically investigates different methods on this dataset and obtains a number of interesting findings.

RESOUND: Towards Action Recognition Without Representation Bias

Experimental evaluation confirms the effectiveness of RESOUND to reduce the static biases of current datasets.

Revealing Occlusions with 4D Neural Fields

A framework for learning to estimate 4D visual representations from monocular RGB-D video is introduced, which is able to persist objects, even once they become obstructed by occlusions, and encode point clouds into a continuous representation, which permits the model to attend across the spatiotemporal context to resolve occlusion.

Probabilistic Representations for Video Contrastive Learning

A self-supervised representation learning method that bridges contrastive learning with probabilistic representation and proposes a stochastic contrastive loss to learn proper video distributions and handle the inherent uncertainty from the nature of the raw video.

Box Embeddings: An open-source library for representation learning using geometric structures

Box Embeddings is introduced, a Python library that enables researchers to easily apply and extend probabilistic box embeddings and is compatible with both PyTorch and TensorFlow, which allows existing neural network layers to be replaced with or transformed into boxes easily.

GRAND: Graph Neural Diffusion

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In

Gromov–Wasserstein distances between Gaussian distributions

It is shown that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also a solution of the linear Gromov–Monge problem.