Semantic Model Vectors for Complex Video Event Recognition

  title={Semantic Model Vectors for Complex Video Event Recognition},
  author={Michele Merler and Bert Huang and Lexing Xie and Gang Hua and Apostol Natsev},
  journal={IEEE Transactions on Multimedia},
We propose semantic model vectors, an intermediate level semantic representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such as those from YouTube. The semantic model vectors are extracted using a set of discriminative semantic classifiers, each being an ensemble of SVM models trained from thousands of labeled web images, for a total of 280 generic concepts. Our study reveals that the proposed semantic model vectors representation outperforms… 
Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images
The proposed method of automatic concept discovery outperforms other well-known concept library construction approaches such as Classemes and ImageNet by a large margin (228%) in zero-shot event retrieval and subjective evaluation by humans confirms clear superiority of the proposed method in discovering concepts for event representation.
Bag of Attributes for Video Event Retrieval
Results using BoA were comparable or superior to the baselines in the task of video event retrieval using the EVVE dataset, with the advantage of providing a much more compact representation.
Encoding Concept Prototypes for Video Event Detection and Summarization
An algorithm is proposed that learns a set of relevant frames as the concept prototypes from web video examples, without the need for frame-level annotations, and use them for representing an event video.
Semantic pooling for complex event detection
The proposed semantic pooling strategy provides a new mechanism for incorporating semantic concepts for low-level feature based event recognition and evaluates the approach on TRECVID MED dataset and shows that semantic Pooling consistently improves the performance compared with conventional pooling strategies.
TagBook: A Semantic Video Representation Without Supervision for Event Detection
This work introduces a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesigns it for video event detection by including video source set refinement and varying the video tag assignment.
Joint Attributes and Event Analysis for Multimedia Event Detection
To harness video attributes, an algorithm established on a correlation vector that correlates them to a target event is proposed, which could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework.
Bi-Level Semantic Representation Analysis for Multimedia Event Detection
This work proposes a bi-level semantic representation analyzing method that learns weights of semantic representation attained from different multimedia archives, and restrains the negative influence of noisy or irrelevant concepts in the overall concept-level.
Exploring semantic concepts for complex event analysis in unconstrained video clips
A novel semantic pooling approach for challenging tasks on long untrimmed Internet videos, especially when only a few shots in a long video are relevant to the event of interest while many other shots are irrelevant or event misleading, and a joint event detection and evidence recounting framework with limited supervision.
Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base
This paper proposes to construct a semantic-visual knowledge base to encode the rich event-centric concepts and their relationships from the well- established lexical databases, including FrameNet, as well as the concept-specific visual knowledge from ImageNet, and designs an effective system for video event recognition.
Video event recognition using concept attributes
This work proposes to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos, and shows how the proposed enhanced event model can further improve the zero-shot learning.


Video event classification using string kernels
This paper presents a method to introduce temporal information for video event recognition within the bag-of-words approach, modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW.
Semantic Event Detection using Conditional Random Fields
Conditional Random Fields are employed to fuse temporal multi-modality cues for event detection to fuse semantic keywords and mid-level keywords and demonstrate that CRFs achieves better performance particularly in slice level measure.
Visual event recognition in videos by learning from web data
A new aligned space-time pyramid matching method to measure the distances between two video clips, and a cross-domain learning method to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains.
Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor
The proposed spatio-temporal descriptor based approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes and effectively detects video events in challenging real-world conditions.
Multimedia semantic indexing using model vectors
The model vector method is presented, different strategies for computing and comparing model vectors are studied, and the retrieval effectiveness of the model vector approach compared to other search methods in a large video retrieval testbed is evaluated.
Semantic representation: search and mining of multimedia content
This paper constructs a model vector that acts as a compact semantic representation of the underlying content and presents experiments in the semantic spaces leveraging such information for enhanced semantic retrieval, classification, visualization, and data mining purposes.
SIFT-Bag kernel for video event analysis
A SIFT-Bag based generative-to-discriminative framework for addressing the problem of video event recognition in unconstrained news videos and shows that the mean average precision is boosted from the best reported 38.2% in [36] to 60.4% based on this new framework.
Short-term audio-visual atoms for generic video concept classification
An effective algorithm, named Short-Term Region tracking with joint Point Tracking and Region Segmentation (STR-PTRS), is developed to extract S-AVAs from generic videos under challenging conditions such as uneven lighting, clutter, occlusions, and complicated motions of both objects and camera.
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
  • Dong Xu, Shih-Fu Chang
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2008
This work systematically study the problem of event recognition in unconstrained news video sequences by adopting the discriminative kernel-based method for which video clip similarity plays an important role and develops temporally aligned pyramid matching (TAPM) for measuring video similarity.
Video event detection using motion relativity and visual relatedness
A new motion feature, namely Expanded Relative Motion Histogram of Bag-of-Visual-Words (ERMH-BoW) to employ motion relativity and visual relatedness for event detection and to alleviate the visual word correlation problem in BoW is proposed.