Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

@article{Ehsani2020UseTF,
  title={Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects},
  author={Kiana Ehsani and Shubham Tulsiani and Saurabh Gupta and Ali Farhadi and Abhinav Kumar Gupta},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={221-230}
}
When we humans look at a video of human-object interaction, we can not only infer what is happening but we can even extract actionable information and imitate those interactions. On the other hand, current recognition or geometric approaches lack the physicality of action representation. In this paper, we take a step towards more physical understanding of actions. We address the problem of inferring contact points and the physical forces from videos of humans interacting with objects. One of… 
Dynamic Modeling of Hand-Object Interactions via Tactile Sensing
TLDR
This work proposes a framework aiming at predicting the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module, which takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
BEHAVE: Dataset and Method for Tracking Human Object Interactions
TLDR
The key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
gradSim: Differentiable simulation for system identification and visuomotor control
We consider the problem of estimating an object’s physical properties such as mass, friction, and elasticity directly from video sequences. Such a system identification problem is fundamentally
CHORE: Contact, Human and Object REconstruction from a single RGB image
TLDR
This work introduces CHORE, a novel method that learns to jointly reconstruct human and object from a single image that significantly outperforms the SOTA and proposes a simple yet effective depth-aware scaling that allows more efficient shape learning on real data.
Learning Cooperative Dynamic Manipulation Skills from Human Demonstration Videos
TLDR
The objective is to extend the concept of learning from demonstration (LfD) to dynamic scenarios, benefiting from widely available or easily producible videos, to achieve this goal.
D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions
TLDR
This work proposes a novel method that frames the dynamic grasp synthesis task in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions.
Differentiable simulation for system identification and visuomotor control ∇ Sim : D IFFERENTIABLE SIMULATION FOR SYSTEM IDENTIFICATION AND VISUOMOTOR CONTROL
We consider the problem of estimating an object’s physical properties such as mass, friction, and elasticity directly from video sequences. Such a system identification problem is fundamentally
Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction
TLDR
This study introduces a video-based method for predicting contact between a hand and an object, and proposes a semi-supervised framework consisting of automatic collection of training data with motion-based pseudo-labels and guided progressive label correction (gPLC), which corrects noisy pseudo-Labels with a small amount of trusted data.
PressureVision: Estimating Hand Pressure from a Single RGB Image
TLDR
The central insight is that the application of pressure by a hand results in informative appearance changes and the appearance of a previously unobserved human hand can be used to accurately infer applied pressure.
Interpretability in Contact-Rich Manipulation via Kinodynamic Images
TLDR
This work addresses the interpretability of NN-based models by introducing the kinodynamic images and proposes a methodology that creates images from kinematic and dynamic data of contact-rich manipulation tasks by using images as the state representation.
...
1
2
...

References

SHOWING 1-10 OF 35 REFERENCES
Inferring Interaction Force from Visual Information without Using Physical Force Sensors
TLDR
A recurrent neural network-based deep model with fully-connected layers is formulated, which models complex temporal dynamics from the visual representations, which predicted the forces predicted by the proposed method are very similar to those measured by force sensors.
"What Happens If..." Learning to Predict the Effect of Forces in Images
TLDR
A deep neural network model is designed that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene by combining Convolutional and Recurrent Neural Networks.
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images
TLDR
This paper defines intermediate physical abstractions called Newtonian scenarios and introduces Newtonian Neural Network (N3) that learns to map a single image to a state in a Newtonian scenario.
Towards force sensing from vision: Observing hand-object interactions to infer manipulation forces
TLDR
A novel, non-intrusive approach for estimating contact forces during hand-object interactions relying solely on visual input provided by a single RGB-D camera, and shows that force sensing from vision (FSV) is indeed feasible.
Hand-Object Contact Force Estimation from Markerless Visual Tracking
TLDR
This work establishes that interaction forces can be estimated in a cost-effective, reliable, non-intrusive way using vision, and learns a mapping between high-level kinematic features based on the equations of motion and the underlying manipulation forces using recurrent neural networks (RNN).
Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video
TLDR
A method is introduced to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video, and develops a method to automatically recognize from the input video the position and timing of contacts between the person and the object or the ground.
Learning a Generative Model for Multi‐Step Human‐Object Interactions from Videos
TLDR
A generative model based on a Recurrent Neural Network that learns the causal dependencies and constraints between individual actions and can be used to generate novel and diverse multi‐step human‐object interactions.
Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces
TLDR
This work introduces an approach to model surface properties governing bounces in everyday scenes and shows that the proposed model out-performs baselines, including trajectory fitting with Newtonian physics, in predicting post-bounce trajectories and inferring physical properties of a scene.
Detecting and Recognizing Human-Object Interactions
TLDR
A novel model is proposed that learns to predict an action-specific density over target object locations based on the appearance of a detected person and efficiently infers interaction triplets in a clean, jointly trained end-to-end system the authors call InteractNet.
...
1
2
3
4
...