Corpus ID: 70129809

TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking

@inproceedings{Awad2017TRECVID2E,
  title={TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking},
  author={G. Awad and A. Butt and J. Fiscus and David Joy and Andrew Delgado and M. Michel and A. Smeaton and Yyette Graham and Gareth J.F. Jones and Wessel Kraaij and G. Qu{\'e}not and Maria Eskevich and R. Ordelman and B. Huet},
  booktitle={TRECVID},
  year={2017}
}
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou… Expand
VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking
TLDR
Whether the combination of the concept-based system, captioning system and text-based search system would do any help to improve search performance is intended to find. Expand
University of Amsterdam and Renmin University at TRECVID 2016: Searching Video, Detecting Events and Describing Video
TLDR
The 2016 edition of the TRECVID benchmark has been a fruitful participation for the joint-team, resulting in the best overall result for zeroand few-example event detection as well as video description by matching and in generative mode. Expand
IRISA at TrecVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking
TLDR
The runs that were submitted to the TRECVid Challenge 2017 for the Video Hyperlinking task show a gain in performance over the baseline BiDNN model both when the metadata filter was used and when the keyframe fusion was done with a pseudo-inverse. Expand
Effective video hyperlinking by means of enriched feature sets and monomodal query combinations
TLDR
The system designed to address feature selection for the video hyperlinking challenge, as defined by TRECVID, is based on different combinations of textual and visual features, enriched to capture the various facets of the videos. Expand
Interpretable Embedding for Ad-Hoc Video Search
TLDR
This paper empirically demonstrates that, by using either the embedding features or concepts, considerable search improvement is attainable on TRECVid benchmarked datasets. Expand
Neighbourhood Structure Preserving Cross-Modal Embedding for Video Hyperlinking
TLDR
The empirical insights are shared on a number of issues in cross-modal learning, including the preservation of neighbourhood structure in embedding, model fine-tuning and issue of missing modality, for video hyperlinking. Expand
Evaluation of automatic video captioning using direct assessment
TLDR
It is shown how the direct assessment method is replicable and robust and scales to where there are many caption-generation techniques to be evaluated including the TRECVid video-to-text task in 2017. Expand
Dual Encoding for Zero-Example Video Retrieval
TLDR
This paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own and establishes a new state-of-the-art for zero-example video retrieval. Expand
What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID
  • Aozhu Chen, Fan Hu, Zihan Wang, Fangming Zhou, Xirong Li
  • Computer Science
  • 2021
For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an important international evaluation. Solutions submitted by the task participants vary in terms of theirExpand
Fusion of Multimodal Embeddings for Ad-Hoc Video Search
TLDR
A new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets is studied and tested on two datasets for two distinct tasks. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking
TLDR
TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking George Awad, Jonathan Fiscus, David Joy, Martial Michel, Alan Smeaton, Wessel Kraaij, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, et al. Expand
Evaluation of automatic video captioning using direct assessment
TLDR
It is shown how the direct assessment method is replicable and robust and scales to where there are many caption-generation techniques to be evaluated including the TRECVid video-to-text task in 2017. Expand
The YLI-MED Corpus: Characteristics, Procedures, and Plans
TLDR
The procedures used to collect the corpus are described; detailed descriptive statistics about the corpus makeup are given (and how video attributes affected annotators' judgments); possible biases in the corpus introduced by the authors' procedural choices are discussed; it is compared with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of the future plans for expanding the annotation effort. Expand
TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics
The TREC Video Retrieval Evaluation (TRECVID) 2011 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digitalExpand
TRECVid Semantic Indexing of Video: A 6-year Retrospective
TLDR
The data, protocol and metrics used for the main and the secondary tasks, the results obtained and the main approaches used by participants are described. Expand
CIDEr: Consensus-based image description evaluation
TLDR
A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated. Expand
Blip10000: a social video dataset containing SPUG content for tagging and retrieval
TLDR
This work presents a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'. Expand
Creating HAVIC: Heterogeneous Audio Visual Internet Collection
TLDR
The HAVIC (Heterogeneous Audio Visual Internet Collection) Corpus will ultimately consist of several thousands of hours of unconstrained user-generated multimedia content, designed with an eye toward providing increased challenges for both acoustic and video processing technologies. Expand
Can machine translation systems be evaluated by the crowd alone
TLDR
A new methodology for crowd-sourcing human assessments of translation quality is presented, which allows individual workers to develop their own individual assessment strategy and has a substantially increased ability to identify significant differences between translation systems. Expand
Feature-based video key frame extraction for low quality video sequences
We present an approach to key frame extraction for structuring user generated videos on video sharing websites (e. g. YouTube). Our approach is intended to link existing image search engines to videoExpand
...
1
2
3
...