• Publications
  • Influence
Opt
Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly
Face2Face: real-time face capture and reenactment of RGB videos
TLDR
Face2Face addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination.
BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration
TLDR
This work systematically addresses issues with a novel, real-time, end-to-end reconstruction framework, which outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness.
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
TLDR
This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
TLDR
This work proposes a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network, and uses a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images.
MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
TLDR
A novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image and can be trained end-to-end in an unsupervised manner, which renders training on very large real world data feasible.
Performance capture from sparse multi-view video
TLDR
A new marker-less approach to capturing human performances from multi-view video that can jointly reconstruct spatio-temporally coherent geometry, motion and textural surface appearance of actors that perform complex and rapid moves is proposed.
A Noise‐aware Filter for Real‐time Depth Upsampling
TLDR
This work presents an adaptive multi-lateral upsampling filter that takes into account the inherent noisy nature of real-time depth data and can greatly improve reconstruction quality, boost the resolution of the data to that of the video sensor, and prevent unwanted artifacts like texture copy into geometry.
Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
TLDR
This hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization that achieves state-of-the-art accuracy on challenging sequences and near-real time performance on a desktop computer.
...
1
2
3
4
5
...