Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
@article{Lee2020MakingSO, title={Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks}, author={M. Lee and Yuke Zhu and Peter A. Zachares and Matthew Tan and K. Srinivasan and S. Savarese and Feifei Li and Animesh Garg and Jeannette Bohg}, journal={IEEE Transactions on Robotics}, year={2020}, volume={36}, pages={582-596} }
Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn… CONTINUE READING
Figures, Tables, and Topics from this paper
13 Citations
Robot Perception enables Complex Navigation Behavior via Self-Supervised Learning
- Computer Science
- ArXiv
- 2020
- PDF
Bayesian and Neural Inference on LSTM-Based Object Recognition From Tactile and Kinesthetic Information
- Computer Science
- IEEE Robotics and Automation Letters
- 2021
- PDF
In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation
- Computer Science
- 2020 IEEE International Conference on Robotics and Automation (ICRA)
- 2020
- 5
- PDF
Human-robot collaboration in sensorless assembly task learning enhanced by uncertainties adaptation via Bayesian Optimization
- Computer Science
- Robotics Auton. Syst.
- 2021
Towards Learning Controllable Representations of Physical Systems
- Computer Science, Engineering
- ArXiv
- 2020
- PDF
Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras
- Computer Science
- 2020 IEEE International Conference on Robotics and Automation (ICRA)
- 2020
- 3
- PDF
Object Detection Recognition and Robot Grasping Based on Machine Learning: A Survey
- Computer Science
- IEEE Access
- 2020
- PDF
Cross-modal Non-linear Guided Attention and Temporal Coherence in Multi-modal Deep Video Models
- Computer Science
- ACM Multimedia
- 2020
References
SHOWING 1-10 OF 92 REFERENCES
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
- Computer Science
- 2019 International Conference on Robotics and Automation (ICRA)
- 2019
- 95
- PDF
Stable reinforcement learning with autoencoders for tactile and visual data
- Computer Science
- 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2016
- 87
- Highly Influential
- PDF
Learning to represent haptic feedback for partially-observable tasks
- Computer Science, Engineering
- 2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017
- 24
- PDF
More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch
- Computer Science, Engineering
- IEEE Robotics and Automation Letters
- 2018
- 84
- PDF
Deep visual foresight for planning robot motion
- Computer Science
- 2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017
- 396
- PDF
Manipulation by Feel: Touch-Based Control with Deep Predictive Models
- Computer Science
- 2019 International Conference on Robotics and Automation (ICRA)
- 2019
- 39
- PDF
See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion
- Computer Science, Medicine
- Science Robotics
- 2019
- 22
- PDF
Learning robot in-hand manipulation with tactile features
- Computer Science
- 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)
- 2015
- 97
- PDF
Deep learning for tactile understanding from visual and haptic data
- Computer Science, Engineering
- 2016 IEEE International Conference on Robotics and Automation (ICRA)
- 2016
- 138
- PDF