Corpus ID: 64056025

TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST

  title={TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST},
  author={P. Over and J. Fiscus and G. Sanders and David Joy and M. Michel and G. Awad and A. Smeaton and Wessel Kraaij and G. Qu{\'e}not},
UvA-DARE ( Digital Academic Repository ) Video 2 vec Embeddings Recognize Events when Examples are Scarce
This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building theExpand
Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression
This paper uses Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and proposes an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Expand
Online multi-task learning for semantic concept detection in video
The Efficient Lifelong Learning Algorithm is extended in the following ways: (a) the objective function of ELLA is solved using quadratic programming instead of solving the Lasso problem, (b) a new label-based constraint is added that considers concept correlations, and (c) linear SVMs are used as base learners instead of logistic regression. Expand
RiskWheel: Interactive visual analytics for surveillance event detection
An interactive visual analytics system that enables effective analysis of detection results and utilization of user feedback to improve surveillance event detection and a novel risk ranking method to differentiate detectionResults and present more informative ones to the user for better interaction are proposed. Expand
VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events
This paper proposes a new video representation for few-example event recognition and translation called VideoStory, which outperforms an embedding without joint-objective and alternatives without any embedding, and translates a previously unseen video to its most likely description from visual content only. Expand
Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts
This paper presents a general framework for the zeroshot learning problem of performing high-level event detection with no training exemplars, using only textual descriptions, and finds that fusion, both within as well as between modalities, is crucial for optimal performance. Expand
The AXES submissions at TRECVID 2013
The authors' INS, MER, and MED systems, which use systems based on state-of-the-art local low-level descriptors for motion, image, and sound, as well as high-level features to capture speech and text and the visual and audio stream respectively, are described. Expand
Mining exoticism from visual content with fusion-based deep neural networks
This paper presents the first approach to automatically classify images as exotic or non-exotic, and investigates the usefulness of hand-crafted features combined with deep features in the proposed fusion-based approach. Expand
Fisher Kernel Temporal Variation-based Relevance Feedback for video retrieval
This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel and uses the FK representation to explicitly capture temporal variation in video via frame-based features taken at different time intervals. Expand
Language guided visual perception
This dissertation explored three settings, where it showed that combining language and vision is useful for machine perception using images and videos, and proposed models that can learn, not only objects, but also their actions, attributes and interactions with other objects, in one unified learning framework and in a never-ending way. Expand