Semantic Pooling for Complex Event Analysis in Untrimmed Videos
@article{Chang2017SemanticPF, title={Semantic Pooling for Complex Event Analysis in Untrimmed Videos}, author={Xiaojun Chang and Yaoliang Yu and Yi Yang and Eric P. Xing}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2017}, volume={39}, pages={1617-1632} }
Pooling plays an important role in generating a discriminative video representation. In this paper, we propose a new semantic pooling approach for challenging event analysis tasks (e.g., event detection, recognition, and recounting) in long untrimmed Internet videos, especially when only a few shots/segments are relevant to the event of interest while many other shots are irrelevant or even misleading. The commonly adopted pooling strategies aggregate the shots indifferently in one way or…
Figures and Tables from this paper
297 Citations
Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding
- Computer ScienceComput. Vis. Image Underst.
- 2021
Revealing Event Saliency in Unconstrained Video Collection
- Computer ScienceIEEE Transactions on Image Processing
- 2017
This paper proposes an unsupervised event saliency revealing framework that first extracts features from multiple modalities to represent each shot in the given video collection, and systematically compares the method to a number of baseline methods on the TRECVID benchmarks.
Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning
- Computer ScienceKDD
- 2020
This work is the first time to define and solve the MEC task, which is a further step towards understanding video events, and achieves state-of-the-art performance on the TRECVID MEDTest dataset, as well as the newly proposed TREC VID-MEC dataset.
Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A new MIL method is proposed, which simultaneously learns a linear SVM classifier and infers a binary indicator for each instance in order to select reliable training instances from each positive or negative bag.
Complex event detection via attention-based video representation and classification
- Computer ScienceMultimedia Tools and Applications
- 2017
Experimental results show that the proposed single model outperforms state-of-the-art approaches on all three real-world video datasets, and demonstrate the effectiveness.
Single-shot Semantic Matching Network for Moment Localization in Videos
- Computer ScienceACM Trans. Multim. Comput. Commun. Appl.
- 2021
A lightweight single-shot semantic matching network (SSMN) is presented to avoid the complex computations required to match the query and the segment candidates, and the proposed SSMN can locate moments of any length theoretically.
One-Shot SADI-EPE: A Visual Framework of Event Progress Estimation
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2019
A visual human action analysis-based framework, namely one-shot simultaneously action detection and identification (SADI)-EPE, is presented and an evaluation criterion for the estimation problem is proposed, which demonstrated the efficacy of the proposed framework.
Towards More Explainability: Concept Knowledge Mining Network for Event Recognition
- Computer ScienceACM Multimedia
- 2020
A concept knowledge mining network (CKMN) for event recognition that aims to obtain a complete concept representation by mining the existing pattern of each concept at different time granularities with dilated temporal pyramid convolution and temporal self-attention.
The Many Shades of Negativity
- Computer ScienceIEEE Transactions on Multimedia
- 2017
The state-of-the-art deep convolutional neural network features are leveraged in the approach for event detection to further boost the performance and introduce a constraint for this purpose.
ZSTAD: Zero-Shot Temporal Activity Detection
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This work designs an end-to-end deep network based on R-C3D that is optimized with an innovative loss function that considers the embeddings of activity labels and their super-classes while learning the common semantics of seen and unseen activities.
References
SHOWING 1-10 OF 76 REFERENCES
Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM
- Computer ScienceICML
- 2015
A novel notion of semantic saliency is defined that assesses the relevance of each shot with the event of interest and prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event detector.
Searching Persuasively: Joint Event Detection and Evidence Recounting with Limited Supervision
- Computer ScienceACM Multimedia
- 2015
A joint framework that simultaneously detects high-level events and localizes the indicative concepts of the events and improves detection by pruning irrelevant noisy concepts while detection directs recounting to the most discriminative evidences is proposed.
Video2vec Embeddings Recognize Events When Examples Are Scarce
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2017
By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.
Multimedia Event Detection Using A Classifier-Specific Intermediate Representation
- Computer ScienceIEEE Transactions on Multimedia
- 2013
This paper has created a discriminative semantic analysis framework based on a tightly coupled intermediate representation that integrates the classifier inference and latent intermediate representation into a joint framework.
Video event recognition using concept attributes
- Computer Science2013 IEEE Workshop on Applications of Computer Vision (WACV)
- 2013
This work proposes to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos, and shows how the proposed enhanced event model can further improve the zero-shot learning.
A discriminative CNN video representation for event detection
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper proposes using a set of latent concept descriptors as the frame descriptor, which enriches visual information while keeping it computationally affordable, in a new state-of-the-art performance in event detection over the largest video datasets.
Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base
- Computer ScienceIEEE Transactions on Multimedia
- 2015
This paper proposes to construct a semantic-visual knowledge base to encode the rich event-centric concepts and their relationships from the well- established lexical databases, including FrameNet, as well as the concept-specific visual knowledge from ImageNet, and designs an effective system for video event recognition.
Learning latent temporal structure for complex event detection
- Computer Science2012 IEEE Conference on Computer Vision and Pattern Recognition
- 2012
A conditional model trained in a max-margin framework that is able to automatically discover discriminative and interesting segments of video, while simultaneously achieving competitive accuracies on difficult detection and recognition tasks is utilized.
Bag-of-Fragments: Selecting and Encoding Video Fragments for Event Detection and Recounting
- Computer ScienceICMR
- 2015
The bag-of-fragments forms an effective encoding for event detection and is able to provide a precise temporally localized event recounting, and it is concluded that fragments matter for video event Detection and recounting.
Dynamic Pooling for Complex Event Recognition
- Computer Science2013 IEEE International Conference on Computer Vision
- 2013
The problem of adaptively selecting pooling regions for the classification of complex video events is considered and it is shown that a globally optimal solution to the inference problem can be obtained efficiently, through the solution of a series of linear programs.