Pose2Room: Understanding 3D Scenes from Human Activities

  title={Pose2Room: Understanding 3D Scenes from Human Activities},
  author={Yinyu Nie and Angela Dai and Xiaoguang Han and Matthias Nie{\ss}ner},
  booktitle={European Conference on Computer Vision},
. With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input [65]. In this work, we pose the question: Can we reason about object structure in real-world environments solely from human trajectory information? Crucially, we observe that human motion and interactions tend to give strong information about the objects in a scene – for instance a person sitting indicates the likely presence of a chair or sofa. To this end, we propose P2R-Net to learn… 

MIME: Human-Aware 3D Scene Generation

This work proposes MIME (Mining Interaction and Movement to infer 3D Environments), which is a generative model of indoor scenes that produces furniture layouts that are consistent with the human movement and produces more diverse and plausible 3D scenes than a recent generative scene method that does not know about human movement.

Scene Synthesis from Human Motion

Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.

COUCH: Towards Controllable Human-Chair Interactions

A novel synthesis framework COUCH is proposed that plans ahead the motion by predicting contact-aware control signals of the hands, which are then used to synthesize contact-conditioned interactions and shows significant quantitative and qualitative improvements over existing methods for human-object interactions.

Learning 3D Scene Priors with 2D Supervision

This work proposes a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth, and achieves state-of-the-art results in scene synthesis against baselines which require for 3D supervision.



Learning Object Arrangements in 3D Scenes using Human Context

This work considers the problem of learning object arrangements in a 3D scene and learns the distribution of human poses in a scene using a variant of the Dirichlet process mixture model that allows sharing of the density function parameters across the same object types.

Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

This work represents human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints and shows quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error.

Scene Semantics from Long-Term Observation of People

This paper constructs a functional object description with the aim to recognize objects by the way people interact with them, and describes scene objects (sofas, tables, chairs) by associated human poses and object appearance.

SceneGrok: inferring action maps in 3D environments

This paper uses RGB-D sensors to capture dense 3D reconstructions of real-world scenes, and trains a classifier which can transfer interaction knowledge to unobserved 3D scenes and demonstrates prediction of action maps in both 3D scans and virtual scenes.

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

This paper presents Infinite Factored Topic Model (IFTM), where a scene is considered as being generated from two types of topics: human configurations and human-object relationships, and shows that the algorithm can recover the human object relationships.

Long-term Human Motion Prediction with Scene Context

This work proposes a novel three-stage framework that exploits scene context to tackle the task of predicting human motion and shows consistent quantitative and qualitative improvements over existing methods.

Stochastic Scene-Aware Motion Prediction

This work presents a novel data-driven, stochastic motion synthesis method that models different styles of performing a given action with a target object and generalizes to target objects of various geometries while enabling the character to navigate in cluttered scenes.

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

A novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user's body using a bi-directional RNN architecture is demonstrated.

Geometric Pose Affordance: 3D Human Pose with Scene Constraints

A novel, view-based representation of scene geometry, a $\textbf{multi-layer depth map}$, which employs multi-hit ray tracing to concisely encode multiple surface entry and exit points along each camera view ray direction, is introduced.

Human-Centric Scene Understanding from Single View 360 Video

A deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space.