SEMBED: Semantic Embedding of Egocentric Action Videos
@inproceedings{Wray2016SEMBEDSE, title={SEMBED: Semantic Embedding of Egocentric Action Videos}, author={Michael Wray and Davide Moltisanti and W. Mayol-Cuevas and Dima Damen}, booktitle={ECCV Workshops}, year={2016} }
We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels. When object interactions are annotated using unbounded choice of verbs, we embrace the wealth and ambiguity of these labels by capturing the semantic relationships as well as the visual similarities over motion and appearance features. We show how SEMBED can interpret a challenging dataset of 1225 freely…
13 Citations
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
It is demonstrated that disagreement stems from a limited understanding of the distinct phases of an action, and proposed annotating based on the Rubicon Boundaries, inspired by a similarly named cognitive model, for consistent temporal bounds of object interactions is proposed.
Learning Visual Actions Using Multiple Verb-Only Labels
- Computer Science, LinguisticsBMVC
- 2019
It is demonstrated that multi-label verb-only representations outperform conventional single verb labels, and other benefits of a multi-verb representation including cross-dataset retrieval and verb type manner and result verb types retrieval.
How Shall We Evaluate Egocentric Action Recognition?
- Computer Science2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
- 2017
This work proposes a set of measures aimed to quantitatively and qualitatively assess the performance of egocentric action recognition methods and investigates how frame-wise predictions can be turned into action-based temporal video segmentations.
An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
- Computer ScienceArXiv
- 2022
This work addresses the challenge of training multi-label action recognition models from only single positive training labels by proposing two approaches that are based on generating pseudo training examples sampled from similar instances within the train set.
Multitask Learning to Improve Egocentric Action Recognition
- Computer Science2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019
This work considers learning the verbs and nouns from which action labels consist of and predict coordinates that capture the hand locations and the gaze-based visual saliency for all the frames of the input video segments to tackle action recognition in egocentric videos.
Personal-location-based temporal segmentation of egocentric videos for lifelogging applications
- Computer ScienceJ. Vis. Commun. Image Represent.
- 2018
Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition
- Computer ScienceArXiv
- 2017
This work model the mapping between observations and interaction classes, as well as class overlaps, towards a probabilistic multi-label classifier that emulates human annotators, and shows that learning from annotation probabilities outperforms majority voting and enables discovery of co-occurring labels.
First-Person Action Decomposition and Zero-Shot Learning
- Computer Science2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2017
By constructing specialized features for the decomposed concepts, this method succeeds in zero-shot learning and outperforms previous results in conventional action recognition when the performance gaps of different features on verb/noun concepts are significant.
Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis
- Computer ScienceSmart Assisted Living
- 2019
It is determined that the recognition of activities is related to the presence of specific objects and that the lack of explicit associations between certain activities and objects hurts classification performance for these activities.
Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions
- Computer Science2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
- 2017
Experimental results show that the proposed method outperforms the state of the art on most recent first person interactions datasets that involve complex ego-motion and surpasses all previous methods that use only RGB images by more than 20% in recognition accuracy.
References
SHOWING 1-10 OF 39 REFERENCES
Delving into egocentric actions
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A novel set of egocentric features are presented and shown how they can be combined with motion and object features and a significant performance boost over all previous state-of-the-art methods is uncovered.
Learning to recognize objects in egocentric activities
- Computer ScienceCVPR 2011
- 2011
The key to this approach is a robust, unsupervised bottom up segmentation method, which exploits the structure of the egocentric domain to partition each frame into hand, object, and background categories and uses Multiple Instance Learning to match object instances across sequences.
Discovering important people and objects for egocentric video summarization
- Computer Science2012 IEEE Conference on Computer Vision and Pattern Recognition
- 2012
This work introduced novel egocentric features to train a regressor that predicts important regions and produces significantly more informative summaries than traditional methods that often include irrelevant or redundant information.
Egocentric Visual Event Classification with Location-Based Priors
- Computer ScienceISVC
- 2010
The method tackles the challenge of a moving camera by creating deformable graph models for classification of actions and events captured from an egocentric point of view, and presents results on a dataset collected within a cluttered environment.
Learning to Recognize Daily Actions Using Gaze
- PsychologyECCV
- 2012
An inference method is presented that can predict the best sequence of gaze locations and the associated action label from an input sequence of images and demonstrates improvements in action recognition rates and gaze prediction accuracy relative to state-of-the-art methods.
Object-Centric Spatio-Temporal Pyramids for Egocentric Activity Recognition
- Computer ScienceBMVC
- 2013
A boosting approach that automatically selects a small set of useful spatio-temporal pyramid histograms among a randomized pool of candidate partitions and an “object-centric” cutting scheme that prefers sampling bin boundaries near those objects prominently involved in the egocentric activities are proposed.
Going Deeper into First-Person Activity Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
By learning to recognize objects, actions and activities jointly, the performance of individual recognition tasks also increase by 30% (actions) and 14% ( objects) and the results of extensive ablative analysis are included to highlight the importance of network design decisions.
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
- Computer Science2013 IEEE International Conference on Computer Vision
- 2013
This paper presents a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object, and uses a Web-scale language model to ``fill in'' novel verbs.
First Person Action Recognition Using Deep Learned Descriptors
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes convolutional neural networks (CNNs) for end to end learning and classification of wearer's actions and shows that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.
Fast unsupervised ego-action learning for first-person sports videos
- Computer ScienceCVPR 2011
- 2011
This work addresses the novel task of discovering first-person action categories (which it is called ego-actions) which can be useful for such tasks as video indexing and retrieval and investigates the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content.