• Corpus ID: 231582760

Learning Intuitive Physics with Multimodal Generative Models

@inproceedings{RezaeiShoshtari2021LearningIP,
  title={Learning Intuitive Physics with Multimodal Generative Models},
  author={S. Rezaei-Shoshtari and Francois Robert Hogan and Michael R. M. Jenkin and David Meger and Gregory Dudek},
  booktitle={AAAI},
  year={2021}
}
Predicting the future interaction of objects when they come into contact with their environment is key for autonomous agents to take intelligent and anticipatory actions. This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes. Visual information captures object properties such as 3D shape and location, while tactile information provides critical cues about interaction forces and resulting… 

Figures and Tables from this paper

Learning Sequential Latent Variable Models from Multimodal Time Series Data
TLDR
This work presents a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics, and demonstrates that this method is nearly as effective as an existing supervised approach that relies on ground truth labels.

References

SHOWING 1-10 OF 55 REFERENCES
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.
Connecting Touch and Vision via Cross-Modal Prediction
TLDR
This work investigates the cross-modal connection between vision and touch with a new conditional adversarial model that incorporates the scale and location information of the touch and demonstrates that the model can produce realistic visual images from tactile data and vice versa.
Multimodal dynamics modeling for off-road autonomous vehicles
TLDR
This study designs a model capable of long-horizon motion predictions, leveraging vision, lidar and proprioception, which is robust to arbitrarily missing modalities at test time, and demonstrates the importance of leveraging multiple sensors when doing dynamics modeling in outdoor conditions.
More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch
TLDR
An end-to-end action-conditional model that learns regrasping policies from raw visuo-tactile data and outperforms a variety of baselines at estimating grasp adjustment outcomes, selecting efficient grasp adjustments for quick grasping, and reducing the amount of force applied at the fingers, while maintaining competitive performance.
Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor
TLDR
The ability of the See-Through-your-Skin sensor to classify household objects, recognize fine textures, and infer their physical properties both through numerical simulations and experiments with a smart countertop prototype are validated.
“Touching to See” and “Seeing to Feel”: Robotic Cross-modal Sensory Data Generation for Visual-Tactile Perception
TLDR
A novel framework for the cross-modal sensory data generation for visual and tactile perception by applying conditional generative adversarial networks to generate pseudo visual images or tactile outputs from data of the other modality is proposed.
Connecting Look and Feel: Associating the Visual and Tactile Properties of Physical Materials
TLDR
This work captures color and depth images of draped fabrics along with tactile data from a high-resolution touch sensor and seeks to associate the information from vision and touch by jointly training CNNs across the three modalities.
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
TLDR
This paper uses vision first, applying neural networks with learned shape priors to predict an object's 3D shape from a single-view color image, and then uses tactile sensing to refine the shape; the robot actively touches the object regions where the visual prediction has high uncertainty.
3D Shape Reconstruction from Vision and Touch
TLDR
This paper introduces a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects and presents an effective chart-based approach to fusing vision and touch, which leverages advances in graph convolutional networks.
Active learning with query paths for tactile object shape exploration
TLDR
An active learning framework based on optimal query paths to efficiently address the problem of tactile object shape exploration and develops two strategies to solve the proposed active path querying learning problem.
...
...