Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

  title={Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding},
  author={Ethan K. Gordon and Sumegh Roychowdhury and Tapomayukh Bhattacharjee and Kevin G. Jamieson and Siddhartha S. Srinivasa},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish… Expand

Figures from this paper


Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously Unseen Food Items
This work demonstrates empirically on a robot- assisted feeding system that, even starting with a model trained on thousands of skewering attempts on dissimilar previously seen food items, e-greedy and LinUCB algorithms can quickly converge to the most successful manipulation strategy. Expand
Robot-Assisted Feeding: Generalizing Skewering Strategies across Food Items on a Realistic Plate
A bite acquisition framework that takes the image of a full plate as an input, uses RetinaNet to create bounding boxes around food items in the image, and applies the skewering-position-action network (SPANet) to choose a target food item and a corresponding action so that the bite acquisition success rate is maximized. Expand
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
This work uses self-supervision to learn a compact and multimodal representation of sensory inputs, which can then be used to improve the sample efficiency of the policy learning of deep reinforcement learning algorithms. Expand
A Contextual Bandit Bake-off
This work uses the availability of large numbers of supervised learning datasets to compare and empirically optimize contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. Expand
Towards Robotic Feeding: Role of Haptics in Fork-Based Food Manipulation
A set of classifiers for compliance-based food categorization from haptic and motion signals is proposed and compared with fixed position-control policies via a robot to highlight the importance of adapting the policy to the compliance of a food item. Expand
Online Learning of Robot Soccer Free Kick Plans Using a Bandit Approach
An online learning approach for teams of autonomous soccer robots to select free kick plans using the Upper Confidence Bound algorithm, and results from a physics-based simulation reveal that the robots are capable of adapting to various different realistic opponents to maximize their expected reward during free kicks. Expand
Learning haptic representation for manipulating deformable food objects
This work designs actions involving use of tools such as forks and knives that obtain haptic data containing information about the physical properties of the object, and presents a method to compactly represent the robot's beliefs about the object's properties using a generative model. Expand
Is More Autonomy Always Better?: Exploring Preferences of Users with Mobility Impairments in Robot-assisted Feeding
It is found that more autonomy is not always better, as participants did not have a preference to use a robot with partial autonomy over a robotWith low autonomy, and participants' user interface preference changes from voice control during individual dining to web-based during social dining. Expand
Offline policy evaluation across representations with applications to educational games
A data-driven methodology for comparing and validating policies offline, which focuses on the ability of each policy to generalize to new data and applies to a partially-observable, high-dimensional concept sequencing problem in an educational game. Expand
Transfer Depends on Acquisition: Analyzing Manipulation Strategies for Robotic Feeding
The results show that an intelligent food item dependent skewering strategy improves the bite acquisition success rate and that the choice of skewering location and the fork orientation affects the ease of bite transfer sianificantly. Expand