Watch-Bot: Unsupervised learning for reminding humans of forgotten actions
@article{Wu2016WatchBotUL, title={Watch-Bot: Unsupervised learning for reminding humans of forgotten actions}, author={Chenxia Wu and Jiemi Zhang and Bart Selman and Silvio Savarese and Ashutosh Saxena}, journal={2016 IEEE International Conference on Robotics and Automation (ICRA)}, year={2016}, pages={2479-2486} }
We present a robotic system that watches a human using a Kinect v2 RGB-D sensor, detects what he forgot to do while performing an activity, and if necessary reminds the person using a laser pointer to point out the related object. Our simple setup can be easily deployed on any assistive robot. Our approach is based on a learning algorithm trained in a purely unsupervised setting, which does not require any human annotations. This makes our approach scalable and applicable to variant scenarios…
Figures and Tables from this paper
13 Citations
Watch-n-Patch: Unsupervised Learning of Actions and Relations
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2018
This work proposes a new probabilistic model that allows for long-range action relations that commonly exist in the composite activities, which is challenging in previous works.
A New Bayesian Modeling for 3D Human-Object Action Recognition
- Computer Science2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
- 2019
This paper proposes a new Bayesian framework to recognize actions on RGB-D videos by two different observations: the human pose and objects in its vicinity and designs a model for each action that integrates these observations and a probabilistic sequencing of actions performed during activities.
Anticipating Human Activities from Surveillance Videos
- Computer Science
- 2017
This work focuses on the study of action recognition, action classification followed by action anticipation, using the UT Interaction data set containing interactive videos with six types of activities to develop a framework for anticipating action.
Late Fusion of Bayesian and Convolutional Models for Action Recognition
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
A hybrid approach resulting from the fusion of a deep learning neural network with a Bayesian-based approach that models human-object interactions and transition between actions is proposed.
Learning Video Models from Text: Zero-Shot Anticipation for Procedural Actions
- Computer ScienceArXiv
- 2021
A hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to video recognizes and predicts coherent and plausible actions multiple steps into the future, all in rich natural language.
Embodied Visual Perception Models For Human Behavior Understanding
- Computer Science
- 2019
This PhD thesis proposes a concept of action-objects--the objects that capture person's conscious visual or tactile interactions, and introduces two models, EgoNet and Visual-Spatial Network (VSN), which detect action- objects in supervised and unsupervised settings respectively.
Egocentric Basketball Motion Planning from a Single First-Person Image
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
We present a model that uses a single first-person image to generate an egocentric basketball motion sequence in the form of a 12D camera configuration trajectory, which encodes a player's 3D…
Structured prediction with short/long-range dependencies for human activity recognition from depth skeleton data
- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017
This paper model the recognition of activities into a sequence-labeling problem and proposes a new probabilistic graphical model (PGM) that can recognize both short/long-range activities, by introducing a hierarchical classification model and including extra links and loopy conditions in the authors' PGM.
Input Mask Encoder z Decoder Reconstructed Mask a ) VAE Input Mask Action CNN b ) Action CNN Predicted Action Probability Fixed Action
- Computer Science
- 2019
A novel action conditioned image synthesis task and a method to solve it in the context of a basketball activity, which generates realistic images that are associated with specific action categories, and it outperforms standard baselines by a large margin.
Zero-Shot Anticipation for Instructional Activities
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
A hierarchical model is presented that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain and predicts coherent and plausible actions multiple steps into the future, all in rich natural language.
References
SHOWING 1-10 OF 39 REFERENCES
Watch-n-patch: Unsupervised understanding of actions and relations
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
The model learns the high-level action co-occurrence and temporal relations between the actions in the activity video and is applied to unsupervised action segmentation and recognition, and also to a novel application that detects forgotten actions, which is called action patching.
Learning human activities and object affordances from RGB-D videos
- Computer ScienceInt. J. Robotics Res.
- 2013
This work considers the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances, and formulate the learning problem using a structural support vector machine (SSVM) approach.
Anticipating Human Activities Using Object Affordances for Reactive Robotic Response
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2016
This work represents each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances and represents each ATCRF as a particle and represents the distribution over the potential futures using a set of particles.
Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web
- Computer ScienceAAAI
- 2015
A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.
Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
- Computer ScienceICML
- 2013
This paper proposes a graph structure that improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.
Unstructured human activity detection from RGBD images
- Computer Science2012 IEEE International Conference on Robotics and Automation
- 2012
This paper uses a RGBD sensor as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information, based on a hierarchical maximum entropy Markov model (MEMM).
Action recognition using ensemble weighted multi-instance learning
- Computer Science2014 IEEE International Conference on Robotics and Automation (ICRA)
- 2014
A novel, 3.5D representation of a depth video for action recognition, which considers the class imbalance and intra-class variations, and integrates an ensemble learning method into the weighted multi-instance learning framework.
Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception
- Computer ScienceRobotics: Science and Systems
- 2014
This work presents an algorithm that produces hierarchical labelings of a scene, following is-part-of and is-type-of relationships, based on a Conditional Random Field that relates pixel-wise and pair-wise observations to labels.
Activity Recognition for Natural Human Robot Interaction
- Computer ScienceICSR
- 2014
This work presents a simple yet effective approach of modelling pose trajectories using directions traversed by human joints over the duration of an activity and represent the action as a histogram of direction vectors.
Human Activity Recognition for Domestic Robots
- Computer ScienceFSR
- 2013
A human activity recognition technique that uses 3D skeleton features produced by a depth camera and incorporates importance weights for skeleton 3D joints according to the activity being performed to ignore the confusing or irrelevant features while relying on informative features.