Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

  title={Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks},
  author={M. Lee and Yuke Zhu and Peter A. Zachares and Matthew Tan and K. Srinivasan and S. Savarese and Feifei Li and Animesh Garg and Jeannette Bohg},
  journal={IEEE Transactions on Robotics},
  • M. Lee, Yuke Zhu, +6 authors Jeannette Bohg
  • Published 2020
  • Computer Science
  • IEEE Transactions on Robotics
  • Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn… CONTINUE READING
    13 Citations
    Robot Perception enables Complex Navigation Behavior via Self-Supervised Learning
    • PDF
    Bayesian and Neural Inference on LSTM-Based Object Recognition From Tactile and Kinesthetic Information
    • PDF
    Tactile-Driven Grasp Stability and Slip Prediction
    • 4
    • PDF
    In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation
    • 5
    • PDF
    Towards Learning Controllable Representations of Physical Systems
    • PDF
    Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras
    • 3
    • PDF
    Cross-modal Non-linear Guided Attention and Temporal Coherence in Multi-modal Deep Video Models


    Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
    • 95
    • PDF
    Stable reinforcement learning with autoencoders for tactile and visual data
    • 87
    • Highly Influential
    • PDF
    Learning to represent haptic feedback for partially-observable tasks
    • 24
    • PDF
    More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch
    • 84
    • PDF
    Deep visual foresight for planning robot motion
    • Chelsea Finn, S. Levine
    • Computer Science
    • 2017 IEEE International Conference on Robotics and Automation (ICRA)
    • 2017
    • 396
    • PDF
    Manipulation by Feel: Touch-Based Control with Deep Predictive Models
    • 39
    • PDF
    Learning dexterous in-hand manipulation
    • 552
    • PDF
    See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion
    • 22
    • PDF
    Learning robot in-hand manipulation with tactile features
    • 97
    • PDF
    Deep learning for tactile understanding from visual and haptic data
    • 138
    • PDF