• Corpus ID: 212694843

TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval

@article{Awad2019TRECVID2A,
  title={TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search \& retrieval},
  author={George Awad and Asad Anwar Butt and Keith Curtis and Yooyoung Lee and Jonathan G. Fiscus and Afzal Godil and Andrew Delgado and Jesse Zhang and Eliot Godard and Lukas L. Diduch and Alan F. Smeaton and Yyette Graham and Wessel Kraaij and Georges Qu{\'e}not},
  journal={ArXiv},
  year={2019},
  volume={abs/2009.09984}
}
The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has… 

VIREO-EURECOM @ TRECVID 2019: Ad-hoc Video Search (AVS)

TLDR
The systems developed for Ad-hoc Video Search (AVS) task at TRECVID 2019 and the achieved results are described and the advantages and shortcomings of these video search approaches are analyzed.

RUC_AIM3 at TRECVID 2019: Video to Text

TLDR
This paper proposes a late fusion strategy to ensemble different models to improve system generalization abilities and generate video representations with rich semantic information via fusing multi-modal features for both two sub-tasks of TRECVID 2019 Video to Text Challenge.

What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID

TLDR
A large-scale and systematic evaluation on TRECVID using selected combinations of state-of-the-art matching models, visual features and (pre-training data) to answer the key question of what matters for AVS.

IMFD IMPRESEE at TRECVID 2019: Ad-Hoc Video Search and Video To Text

TLDR
A deep learning model based on Word2VisualVec++ is developed, extracting temporal information of the video by using Dense Trajectories and a clustering approach to encode them into a single vector representation.

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

TLDR
Analysis of query logs collected by the top three performing systems, SOMHunter, VIRET, and vitrivr, reveals that the top two systems mostly relied on temporal queries before a correct frame was identified, and constitutes a new baseline methodology for future events.

ITI-CERTH participation in TRECVID 2018

TLDR
An overview of the runs submitted to TRECVID 2020 by ITI-CERTH is provided, which includes participation in the Ad-hoc Video Search, Disaster Scene Description and Indexing and Activities in Extended Video tasks.

Renmin University of China and Zhejiang Gongshang University at TRECVID 2019: Learn to Search and Describe Videos

TLDR
The 2019 edition of the TRECVID benchmark has been a fruitful participation for the joint-team and the solutions based on two deep learning models, i.e. the W2VV++ network and the Dual Encoding Network, are developed.

A comprehensive review of the video-to-text problem

TLDR
This paper reviews the video-to-text problem, in which the goal is to associate an input video with its textual description, and categorizes and describes the state-of-the-art techniques.

VireoJD-MM @ TRECVid 2019: Activities in Extended Video (ActEV)

TLDR
This paper describes the system developed for Activities in Extended Video(ActEV) task at TRECVid 2019 and the achieved results, and extends the system for two aspects separately: better object detection and advanced two-stream action classification.

Hybrid Sequence Encoder for Text Based Video Retrieval

TLDR
This report presents a hybrid sequential encoder which make use of the utilities of not only the multi-modal sources but also the feature extractors such as GRU, aggregated vectors, graph modeling, etc in this AVS task.
...

References

SHOWING 1-10 OF 20 REFERENCES

TRECVid Semantic Indexing of Video: A 6-year Retrospective

TLDR
The data, protocol and metrics used for the main and the secondary tasks, the results obtained and the main approaches used by participants are described.

Evaluation of automatic video captioning using direct assessment

TLDR
It is shown how the direct assessment method is replicable and robust and scales to where there are many caption-generation techniques to be evaluated including the TRECVid video-to-text task in 2017.

Dual Encoding for Zero-Example Video Retrieval

TLDR
This paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own and establishes a new state-of-the-art for zero-example video retrieval.

A large-scale benchmark dataset for event recognition in surveillance video

We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor

TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking

TLDR
TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking George Awad, Jonathan Fiscus, David Joy, Martial Michel, Alan Smeaton, Wessel Kraaij, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, et al.

CIDEr: Consensus-based image description evaluation

TLDR
A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.

V3C - a Research Video Collection

TLDR
This work states that existing video datasets used for research and experimentation are either not large enough to represent current collections or do not reflect the properties of video commonly found on the Internet in terms of content, length, or resolution.

Very Deep Convolutional Networks for Large-Scale Image Recognition

TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

TLDR
METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.

A simple and efficient sampling method for estimating AP and NDCG

We consider the problem of large scale retrieval evaluation. Recently two methods based on random sampling were proposed as a solution to the extensive effort required to judge tens of thousands of