Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!

@article{Soran2015GeneratingNF,
  title={Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!},
  author={B. Soran and Ali Farhadi and L. Shapiro},
  journal={2015 IEEE International Conference on Computer Vision (ICCV)},
  year={2015},
  pages={4669-4677}
}
We all have experienced forgetting habitual actions among our daily activities. For example, we probably have forgotten to turn the lights off before leaving a room or turn the stove off after cooking. In this paper, we propose a solution to the problem of issuing notifications on actions that may be missed. This involves learning about interdependencies between actions and being able to predict an ongoing action while segmenting the input video stream. In order to show a proof of concept, we… Expand
What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention
TLDR
This work tackles the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to summarize the past and formulate predictions about the future using a novel Modality ATTention mechanism which learns to weigh modalities in an adaptive fashion. Expand
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video
TLDR
Rolling-Unrolling LSTM is contributed, a learning architecture to anticipate actions form egocentric videos that achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition. Expand
Egocentric Action Anticipation by Disentangling Encoding and Inference
TLDR
This work introduces a learning architecture which makes use of a "rolling" LSTM to continuously summarize the past and an "unrolling" MSP to anticipate future actions at multiple temporal scales and demonstrates that the proposed method surpasses the state-of-the-art when considering Top-1 and Top-5 accuracy respectively. Expand
Am I Done? Predicting Action Progress in Videos
TLDR
A novel approach is introduced, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution, based on a combination of the Faster R-CNN framework and LSTM networks. Expand
Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos
TLDR
This work describes an end-to-end generative approach from the encoding of features to the structural modeling of complex human activities by applying Fisher vectors and temporal models for the analysis of video sequences and demonstrates that combining compact video representations based on Fisher Vectors with HMM-based modeling yields very significant gains in accuracy. Expand
Predicting Human Intentions from Motion Cues Only: A 2D+3D Fusion Approach
TLDR
A new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences is introduced, where, by only analysing very similar movements in both training and test phases, it is able to predict the underlying intention, i.e., the future, never observed action. Expand
Forecasting Action through Contact Representations from First Person Video
Human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations andExpand
Predicting Human Intentions from Motion Only: A 2D+3D Fusion Approach
TLDR
A new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences is introduced, where, by only analysing very similar movements in both training and test phases, the underlying intent can be forecast by looking at the kinematics of the immediately preceding movement. Expand
Intention from Motion
TLDR
This paper proposes Intention from Motion, a new paradigm for action prediction where, without using any contextual information, human intentions all originating from the same motor act, non specific of the following performed action, and designs a proof of concept consisting in a new multi-modal dataset. Expand
Leveraging information from imperfect examples: Common action sequence mining from a mix of incorrect performances
TLDR
This work introduces a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation and proposes a technique to learn the temporal order of these key poses from these imperfect videos. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
Human activity prediction: Early recognition of ongoing activities from streaming videos
  • M. Ryoo
  • Computer Science
  • 2011 International Conference on Computer Vision
  • 2011
TLDR
The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag- of-words to handle noisy observations, and reliably recognizes ongoing activities from streaming videos with a high accuracy. Expand
A Discriminative Model with Multiple Temporal Scales for Action Prediction
TLDR
A novel discriminative multi-scale model for predicting the action class from a partially observed video, which captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments. Expand
Detecting activities of daily living in first-person camera views
TLDR
This work presents a novel dataset and novel algorithms for the problem of detecting activities of daily living in firstperson camera views, and develops novel representations including temporal pyramids and composite object models that exploit the fact that objects look different when being interacted with. Expand
Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition
TLDR
A latency-aware learning formulation is used to train a logistic regression-based classifier that automatically determines distinctive canonical poses from data and uses these to robustly recognize actions in the presence of ambiguous poses. Expand
A Hierarchical Representation for Future Action Prediction
TLDR
This work considers inferring the future actions of people from a still image or a short video clip, which aims to capture the subtle details inherent in human movements that may imply a future action. Expand
Activity Forecasting
TLDR
The unified model uses state-of-the-art semantic scene understanding combined with ideas from optimal control theory to achieve accurate activity forecasting and shows how the same techniques can improve the results of tracking algorithms by leveraging information about likely goals and trajectories. Expand
Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video
TLDR
It is argued that a hierarchical, object-oriented design lends the solution to be scalable in that higher-level reasoning components are independent from the particular low-level detector implementation and that recognition of additional activities and actions can easily be added. Expand
Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?
TLDR
An algorithm to recognize human activities targeting the camera from streaming videos is presented, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur). Expand
First-Person Activity Recognition: What Are They Doing to Me?
  • M. Ryoo, L. Matthies
  • Computer Science
  • 2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
TLDR
This paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. Expand
Learning to Recognize Daily Actions Using Gaze
TLDR
An inference method is presented that can predict the best sequence of gaze locations and the associated action label from an input sequence of images and demonstrates improvements in action recognition rates and gaze prediction accuracy relative to state-of-the-art methods. Expand
...
1
2
3
4
5
...