Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors

  title={Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors},
  author={Michelle A. Lee and Matthew Tan and Yuke Zhu and Jeannette Bohg},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
Using sensor data from multiple modalities presents an opportunity to encode redundant and complementary features that can be useful when one modality is corrupted or noisy. Humans do this everyday, relying on touch and proprioceptive feedback in visually-challenging environments. However, robots might not always know when their sensors are corrupted, as even broken sensors can return valid values. In this work, we introduce the Crossmodal Compensation Model (CCM), which can detect corrupted… 

Figures and Tables from this paper

Multiscale Sensor Fusion and Continuous Control with Neural CDEs
InFuser is presented, a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs) and evolves a single latent state representation over time that enables policies that can react to multi-frequency multi-sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions.
Multi-Modal Fusion in Contact-Rich Precise Tasks via Hierarchical Policy Learning
Combined visual and force feedback play an essential role in contact-rich robotic manipulation tasks. Current methods focus on developing the feedback control around a single modality while
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
This work releases MULTIBENCH, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas, and provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation.
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions
A novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm is introduced, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Dynamic Multimodal Fusion
DynMM is proposed, a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference and opens a novel direction towards dynamic multi-modality network design, with applications to a wide range of multimodAL tasks.
Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things
This work proposes a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) which takes advantage of learning using privileged information (LUPI), and fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time.
Multi-modal self-adaptation during object recognition in an artificial cognitive system
This work creates a multimodal learning transfer mechanism capable of both detecting sudden and permanent anomalies in the visual channel and maintaining visual object recognition performance by retraining the visual mode for a few minutes using haptic information.
Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation
This work proposes MOBILE MANIPULATION ROBOTURK (MOMART), a novel teleoperation framework allowing simultaneous navigation and manipulation of mobile manipulators, and proposes a learned error detection system to address covariate shift by detecting when an agent is in a potential failure state.
Out-of-Distribution Detection for Automotive Perception
This paper presents a method for determining whether inputs are OOD, which does not require OOD data during training and does not increase the computational cost of inference, which is especially important in automotive applications with limited computational resources and real-time constraints.


Safe Visual Navigation via Deep Learning and Novelty Detection
This work uses an autoencoder to recognize when a query is novel, and revert to a safe prior behavior, and can deploy an autonomous deep learning system in arbitrary environments, without concern for whether it has received the appropriate training.
Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation
This work proposes a novel stochastic regularization technique, called Sensor Dropout, to robustify multimodal sensor policy learning outcomes and shows that the learned policies are conditioned on the same latent input distribution despite having multiple sensory observations spaces - a hallmark of true sensor-fusion.
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
Self-supervision is used to learn a compact and multimodal representation of the authors' sensory inputs, which can then be used to improve the sample efficiency of the policy learning of self-supervised learning algorithms.
Cross-modal interpretation of multi-modal sensor streams in interactive perception based on coupled recursion
This work presents an online system to perceive kinematic properties of articulated objects from multi-modal sensor streams that is sufficiently fast to provide feedback during manipulation actions and sufficiently comprehensive to allow the generation of new manipulation actions.
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?
An epistemic uncertainty-aware planning method, called robust imitative planning (RIP), that can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes.
Deep Visuo-Tactile Learning: Estimation of Tactile Properties from Images
A model to estimate the degree of tactile properties from visual perception alone (e.g., the level of slipperiness or roughness) is proposed, which extends a encoder-decoder network, in which the latent variables are visual and tactile features.
Connecting Touch and Vision via Cross-Modal Prediction
This work investigates the cross-modal connection between vision and touch with a new conditional adversarial model that incorporates the scale and location information of the touch and demonstrates that the model can produce realistic visual images from tactile data and vice versa.
Cross-modal noise compensation in audiovisual words
Noise-compensation for spoken and printed words is investigated in two experiments and it is observed that accuracy was modulated by reaction time, bias and sensitivity, but noise compensation could nevertheless be explained via accuracy differences when controlling for RT, biases and sensitivity.
Building Kinematic and Dynamic Models of Articulated Objects with Multi-Modal Interactive Perception
This work proposes an interactive perception system to build kinematic and dynamic models of articulated objects from multi-modal sensor streams and shows experimentally that the system integrates and balances between different modalities according to their uncertainty in order to overcome limitations of uni- modal perception.