Encouraging LSTMs to Anticipate Actions Very Early

@article{Akbarian2017EncouragingLT,
  title={Encouraging LSTMs to Anticipate Actions Very Early},
  author={Mohammad Sadegh Ali Akbarian and Fatemeh Sadat Saleh and Mathieu Salzmann and Basura Fernando and Lars Petersson and Lars Andersson},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={280-289}
}
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer vision applications requiring to react as early as possible, such as autonomous navigation. In this paper, we propose a new action anticipation method that achieves high prediction accuracy even in the presence of a very small percentage of a video sequence. To… 
Am I Done? Predicting Action Progress in Videos
TLDR
A novel approach is introduced, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution, based on a combination of the Faster R-CNN framework and LSTM networks.
Forecasting Future Action Sequences With Attention: A New Approach to Weakly Supervised Action Forecasting
TLDR
A model to predict actions of future unseen frames without using frame level annotations during training is proposed, and it outperforms prior models by 1.04% leveraging on proposed weakly supervised architecture, and effective use of attention mechanism and loss functions.
Delving into 3D Action Anticipation from Streaming Videos
TLDR
This paper investigates the problem of 3D action anticipation from streaming videos with the target of understanding best practices for solving this problem, and proposes a novel method with a multi-task loss.
Knowledge Distillation for Human Action Anticipation
TLDR
A novel knowledge distillation framework is proposed that uses an action recognition network to supervise the training of an action anticipation network, guiding the latter to attend to the relevant information needed for correctly anticipating the future actions.
Back to the Future: Knowledge Distillation for Human Action Anticipation
TLDR
A novel knowledge distillation framework is proposed that uses an action recognition network to supervise the training of an action anticipation network, guiding the latter to attend to the relevant information needed for correctly anticipating the future actions.
Knowledge Distillation for Action Anticipation via Label Smoothing
TLDR
A multi-modal framework based on long short-term memory networks to summarize past observations and make predictions at different time steps is implemented and shows that label smoothing systematically improves performance of state-of-the-art models for action anticipation.
Forecasting Future Sequence of Actions to Complete an Activity
TLDR
This work presents a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture and proposes a novel loss function to cater for two types of uncertainty in the future predictions.
What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention
TLDR
This work tackles the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to summarize the past and formulate predictions about the future using a novel Modality ATTention mechanism which learns to weigh modalities in an adaptive fashion.
On Encoding Temporal Evolution for Real-time Action Prediction
TLDR
This work constructs dynamic images (DIs) by summarising moving pixels through a sequence of future frames by training a convolutional LSTMs to predict the next DIs based on an unsupervised learning process, and then recognising the activity associated with the predicted DI.
Anticipation of Human Actions With Pose-Based Fine-Grained Representations
TLDR
This work extracts fine-grained features on visible human actors and predicts the future via an L2-regression in feature space, guided by a pose prediction system that models current and future human poses in the scene.
...
...

References

SHOWING 1-10 OF 61 REFERENCES
A Discriminative Model with Multiple Temporal Scales for Action Prediction
TLDR
A novel discriminative multi-scale model for predicting the action class from a partially observed video, which captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments.
Anticipating Visual Representations from Unlabeled Video
TLDR
This work presents a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects and applies recognition algorithms on the authors' predicted representation to anticipate objects and actions.
Actionness Estimation Using Hybrid Fully Convolutional Networks
TLDR
A new deep architecture for actionness estimation is presented, called hybrid fully convolutional network (HFCN), which is composed of appearance FCN (A-FCN) and motionFCN (M-FCNs), which leverage the strong capacity of deep models to estimate actionness maps from the perspectives of static appearance and dynamic motion.
Actions ~ Transformations
TLDR
A novel representation for actions is proposed by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.
Towards Understanding Action Recognition
TLDR
It is found that high-level pose features greatly outperform low/mid level features, in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information.
Predicting the Where and What of Actors and Actions through Online Action Localization
TLDR
A novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video using a new measure to quantify the performance of action prediction, which analyzes how the prediction accuracy varies as a function of observed portion of the video.
Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture
TLDR
A sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams and shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision and recall.
Fast action proposals for human action detection and search
  • Gang Yu, Junsong Yuan
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of the action proposals as well as competitive results on action detection and search.
...
...