Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!

@article{Soran2015GeneratingNF,
  title={Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!},
  author={Bilge Soran and Ali Farhadi and Linda G. Shapiro},
  journal={2015 IEEE International Conference on Computer Vision (ICCV)},
  year={2015},
  pages={4669-4677}
}
We all have experienced forgetting habitual actions among our daily activities. For example, we probably have forgotten to turn the lights off before leaving a room or turn the stove off after cooking. In this paper, we propose a solution to the problem of issuing notifications on actions that may be missed. This involves learning about interdependencies between actions and being able to predict an ongoing action while segmenting the input video stream. In order to show a proof of concept, we… 

What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention

This work tackles the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to summarize the past and formulate predictions about the future using a novel Modality ATTention mechanism which learns to weigh modalities in an adaptive fashion.

Towards Streaming Egocentric Action Anticipation

A lightweight action anticipation model consisting in a simple feed-forward 3D CNN, which is proposed to optimize using knowledge distillation techniques and a custom loss and shows that the proposed approach outperforms prior art in the streaming scenario, also in combination with other lightweight models.

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

Rolling-Unrolling LSTM is contributed, a learning architecture to anticipate actions form egocentric videos that achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition.

Egocentric Action Anticipation by Disentangling Encoding and Inference

This work introduces a learning architecture which makes use of a "rolling" LSTM to continuously summarize the past and an "unrolling" MSP to anticipate future actions at multiple temporal scales and demonstrates that the proposed method surpasses the state-of-the-art when considering Top-1 and Top-5 accuracy respectively.

Am I Done? Predicting Action Progress in Videos

A novel approach is introduced, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution, based on a combination of the Faster R-CNN framework and LSTM networks.

Rethinking Learning Approaches for Long-Term Action Anticipation

Anticipatr is introduced which performs long-term action anticipation leverag-ing segment-level representations learned using individual segments from different activities, in addition to a video-level representation, to directly predict a set of future action instances over any given anticipation duration.

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos

This work describes an end-to-end generative approach from the encoding of features to the structural modeling of complex human activities by applying Fisher vectors and temporal models for the analysis of video sequences and demonstrates that combining compact video representations based on Fisher Vectors with HMM-based modeling yields very significant gains in accuracy.

Predicting Human Intentions from Motion Cues Only: A 2D+3D Fusion Approach

A new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences is introduced, where, by only analysing very similar movements in both training and test phases, it is able to predict the underlying intention, i.e., the future, never observed action.

Forecasting Action through Contact Representations from First Person Video

Human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and

Predicting Human Intentions from Motion Only: A 2D+3D Fusion Approach

A new multi-modal dataset consisting of a set of motion capture marker 3D data and 2D video sequences is introduced, where, by only analysing very similar movements in both training and test phases, the underlying intent can be forecast by looking at the kinematics of the immediately preceding movement.
...

References

SHOWING 1-10 OF 45 REFERENCES

Human activity prediction: Early recognition of ongoing activities from streaming videos

  • M. Ryoo
  • Computer Science
    2011 International Conference on Computer Vision
  • 2011
The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag- of-words to handle noisy observations, and reliably recognizes ongoing activities from streaming videos with a high accuracy.

A Discriminative Model with Multiple Temporal Scales for Action Prediction

A novel discriminative multi-scale model for predicting the action class from a partially observed video, which captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments.

Detecting activities of daily living in first-person camera views

This work presents a novel dataset and novel algorithms for the problem of detecting activities of daily living in firstperson camera views, and develops novel representations including temporal pyramids and composite object models that exploit the fact that objects look different when being interacted with.

A Hierarchical Representation for Future Action Prediction

This work considers inferring the future actions of people from a still image or a short video clip, which aims to capture the subtle details inherent in human movements that may imply a future action.

Activity Forecasting

The unified model uses state-of-the-art semantic scene understanding combined with ideas from optimal control theory to achieve accurate activity forecasting and shows how the same techniques can improve the results of tracking algorithms by leveraging information about likely goals and trajectories.

Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video

It is argued that a hierarchical, object-oriented design lends the solution to be scalable in that higher-level reasoning components are independent from the particular low-level detector implementation and that recognition of additional activities and actions can easily be added.

Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?

An algorithm to recognize human activities targeting the camera from streaming videos is presented, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur).

First-Person Activity Recognition: What Are They Doing to Me?

  • M. RyooL. Matthies
  • Computer Science
    2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
This paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos.

Learning to Recognize Daily Actions Using Gaze

An inference method is presented that can predict the best sequence of gaze locations and the associated action label from an input sequence of images and demonstrates improvements in action recognition rates and gaze prediction accuracy relative to state-of-the-art methods.

Modeling Actions through State Changes

  • A. FathiJames M. Rehg
  • Computer Science
    2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
This paper proposes a weakly supervised method for learning the object and material states that are necessary for recognizing daily actions and demonstrates that this method can be used to segment discrete actions from a continuous video of an activity.