Creatures great and SMAL: Recovering the shape and motion of animals from video

@inproceedings{Biggs2018CreaturesGA,
  title={Creatures great and SMAL: Recovering the shape and motion of animals from video},
  author={Benjamin Biggs and Thomas Roddick and Andrew W. Fitzgibbon and Roberto Cipolla},
  booktitle={ACCV},
  year={2018}
}
We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability of motion capture training data from animals, and the difficulty of generating realistic synthetic… 

LASR: Learning Articulated Shape Reconstruction from a Monocular Video

TLDR
This work introduces a template-free approach to learn 3D shapes from a single video with an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixel values to compare with video observations, which generates gradients to adjust the camera, shape and motion parameters.

Artemis: Articulated Neural Pets with Appearance and Motion synthesis

TLDR
The core of ARTEMIS is a neural-generated (NGI) animal engine, which adopts an efficient octree-based representation for animal animation and fur rendering, and introduces an effective opti- mization scheme to reconstruct the skeletal motion of real animals captured by a multi-view RGB and Vicon camera array.

hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

TLDR
The hSMAL model is applied to the problem of lameness detection from video, where the model is fit to images to recover 3D pose and train an ST-GCN network on pose data.

Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild”

TLDR
This method, SMALST (SMAL with learned Shape and Texture) goes beyond previous work, which assumed manual keypoints and/or segmentation, to regress directly from pixels to 3D animal shape, pose and texture.

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

TLDR
ViSER is introduced, a method for recovering articulated 3D shapes and dense 3D trajectories from monocular videos, making use of only 2D object masks and two-frame optical flow as inputs to establish dense long-range correspondences across pixels.

Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop

TLDR
An automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images is introduced and a new parameterized model (including limb scaling) SMBLD is generated which is released alongside the new annotation dataset StanfordExtra to the research community.

Coarse-to-fine Animal Pose and Shape Estimation

TLDR
This work designs the mesh refinement GCN (MRGCN) as an encoder-decoder structure with hierarchical feature representations to overcome the limited receptive field of traditional GCNs, and observes that the global image feature used by existing animal mesh reconstruction works is unable to capture detailed shape information for mesh refinement.

BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

TLDR
This work shows that a-priori information about genetic similarity can help to compensate for the lack of 3D training data, and significantly improve shape accuracy over a baseline without them.

Multi-animal pose estimation and tracking with DeepLabCut

TLDR
This work builds on DeepLabCut, a popular open source pose estimation toolbox, and provides high-performance animal assembly and tracking—features required for robust multi-animal scenarios and integrates the ability to predict an animal’s identity directly to assist tracking (in case of occlusions).

DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

TLDR
DensePose 3D is contributed, a method that can learn monocular 3D reconstructions in a weakly supervised fashion from 2D image annotations only, in stark contrast with previous deformable reconstruction methods that use parametric models such as SMPL pre-trained on a large dataset of 3D object scans.

References

SHOWING 1-10 OF 48 REFERENCES

Animal gaits from video

TLDR
This method saves user time and effort since there is no more need for manual selection within the video and then trials and errors in the choice of key-images and 3D pose examples, as well as proposing a simple algorithm based on PCA images to resolve3D pose prediction ambiguities.

Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images

TLDR
This work proposes a method to capture the detailed 3D shape of animals from images alone and is able to model new species, including the shape of an extinct animal, using only a few video frames.

End-to-End Recovery of Human Shape and Pose

TLDR
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

TLDR
The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.

Monocular 3D pose estimation and tracking by detection

TLDR
A three-stage process building on a number of recent advances to recover 3D human pose from monocular image sequences and demonstrates state-of-the-art performance on the HumanEva II benchmark, and shows the applicability of the approach to articulated 3D tracking in realistic street conditions.

Learning an efficient model of hand shape variation from depth images

TLDR
A substantial improvement in the representational power of the model is shown, while maintaining the efficiency of a linear shape basis, and it is shown that hand shape variation can be represented using only a small number of basis components.

Real-time human pose recognition in parts from single depth images

TLDR
This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.

Lucid Data Dreaming for Object Tracking

TLDR
This work proposes a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x ~ 100x less annotated data than competing methods, and generates in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames.

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

TLDR
A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.

Inferring 3D Shapes and Deformations from Single Views

TLDR
A novel probabilistic inference algorithm for 3D shape estimation is proposed by maximum likelihood estimates of the GPLVM latent variables and the camera parameters that best fit generated 3D shapes to given silhouettes.