Watch-Bot: Unsupervised learning for reminding humans of forgotten actions

@article{Wu2016WatchBotUL,
  title={Watch-Bot: Unsupervised learning for reminding humans of forgotten actions},
  author={Chenxia Wu and Jiemi Zhang and Bart Selman and Silvio Savarese and Ashutosh Saxena},
  journal={2016 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2016},
  pages={2479-2486}
}
We present a robotic system that watches a human using a Kinect v2 RGB-D sensor, detects what he forgot to do while performing an activity, and if necessary reminds the person using a laser pointer to point out the related object. Our simple setup can be easily deployed on any assistive robot. Our approach is based on a learning algorithm trained in a purely unsupervised setting, which does not require any human annotations. This makes our approach scalable and applicable to variant scenarios… 
Watch-n-Patch: Unsupervised Learning of Actions and Relations
TLDR
This work proposes a new probabilistic model that allows for long-range action relations that commonly exist in the composite activities, which is challenging in previous works.
A New Bayesian Modeling for 3D Human-Object Action Recognition
TLDR
This paper proposes a new Bayesian framework to recognize actions on RGB-D videos by two different observations: the human pose and objects in its vicinity and designs a model for each action that integrates these observations and a probabilistic sequencing of actions performed during activities.
Late Fusion of Bayesian and Convolutional Models for Action Recognition
TLDR
A hybrid approach resulting from the fusion of a deep learning neural network with a Bayesian-based approach that models human-object interactions and transition between actions is proposed.
Learning Video Models from Text: Zero-Shot Anticipation for Procedural Actions
TLDR
A hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to video recognizes and predicts coherent and plausible actions multiple steps into the future, all in rich natural language.
Embodied Visual Perception Models For Human Behavior Understanding
TLDR
This PhD thesis proposes a concept of action-objects--the objects that capture person's conscious visual or tactile interactions, and introduces two models, EgoNet and Visual-Spatial Network (VSN), which detect action- objects in supervised and unsupervised settings respectively.
Egocentric Basketball Motion Planning from a Single First-Person Image
We present a model that uses a single first-person image to generate an egocentric basketball motion sequence in the form of a 12D camera configuration trajectory, which encodes a player's 3D
Structured prediction with short/long-range dependencies for human activity recognition from depth skeleton data
TLDR
This paper model the recognition of activities into a sequence-labeling problem and proposes a new probabilistic graphical model (PGM) that can recognize both short/long-range activities, by introducing a hierarchical classification model and including extra links and loopy conditions in the authors' PGM.
Input Mask Encoder z Decoder Reconstructed Mask a ) VAE Input Mask Action CNN b ) Action CNN Predicted Action Probability Fixed Action
  • Computer Science
  • 2019
TLDR
A novel action conditioned image synthesis task and a method to solve it in the context of a basketball activity, which generates realistic images that are associated with specific action categories, and it outperforms standard baselines by a large margin.
Zero-Shot Anticipation for Instructional Activities
TLDR
A hierarchical model is presented that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain and predicts coherent and plausible actions multiple steps into the future, all in rich natural language.
Fusion de modèles bayésiens et de convolution pour la reconnaissance d’actions
TLDR
An hybrid approach by the mean of the fusion of a deep learning network with a Bayesian model based on the interactions between human and objects and transitions between actions to combine the two approaches in the final prediction.
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
Watch-n-patch: Unsupervised understanding of actions and relations
TLDR
The model learns the high-level action co-occurrence and temporal relations between the actions in the activity video and is applied to unsupervised action segmentation and recognition, and also to a novel application that detects forgotten actions, which is called action patching.
Learning human activities and object affordances from RGB-D videos
TLDR
This work considers the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances, and formulate the learning problem using a structural support vector machine (SSVM) approach.
Anticipating Human Activities Using Object Affordances for Reactive Robotic Response
TLDR
This work represents each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances and represents each ATCRF as a particle and represents the distribution over the potential futures using a set of particles.
Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web
TLDR
A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.
Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
TLDR
This paper proposes a graph structure that improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.
Unstructured human activity detection from RGBD images
TLDR
This paper uses a RGBD sensor as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information, based on a hierarchical maximum entropy Markov model (MEMM).
Action recognition using ensemble weighted multi-instance learning
TLDR
A novel, 3.5D representation of a depth video for action recognition, which considers the class imbalance and intra-class variations, and integrates an ensemble learning method into the weighted multi-instance learning framework.
Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception
TLDR
This work presents an algorithm that produces hierarchical labelings of a scene, following is-part-of and is-type-of relationships, based on a Conditional Random Field that relates pixel-wise and pair-wise observations to labels.
Activity Recognition for Natural Human Robot Interaction
TLDR
This work presents a simple yet effective approach of modelling pose trajectories using directions traversed by human joints over the duration of an activity and represent the action as a histogram of direction vectors.
Human Activity Recognition for Domestic Robots
TLDR
A human activity recognition technique that uses 3D skeleton features produced by a depth camera and incorporates importance weights for skeleton 3D joints according to the activity being performed to ignore the confusing or irrelevant features while relying on informative features.
...
1
2
3
4
...