How Deep Features Have Improved Event Recognition in Multimedia

@article{Ahmad2019HowDF,
  title={How Deep Features Have Improved Event Recognition in Multimedia},
  author={Kashif Ahmad and Nicola Conci},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  year={2019},
  volume={15},
  pages={1 - 27}
}
  • Kashif Ahmad, N. Conci
  • Published 5 June 2019
  • Computer Science
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Event recognition is one of the areas in multimedia that is attracting great attention of researchers. Being applicable in a wide range of applications, from personal to collective events, a number of interesting solutions for event recognition using multimedia information sources have been proposed. On the other hand, following their immense success in classification, object recognition, and detection, deep learning has been shown to perform well in event recognition tasks also. Thus, a large… 

Figures and Tables from this paper

Event Recognition Based on Classification of Generated Image Captions
TLDR
It is shown that the image captions trained on the Conceptual Captions dataset can be classified more accurately than the features from an object detector, though they both are obviously not as rich as the CNN-based features.
Object Detection for Unseen Domains while Reducing Response Time using Knowledge Transfer in Multimedia Event Processing
TLDR
This work proposes two domain adaptation based models while leveraging Transfer Learning (TL) and Large Scale Detection through Adaptation (LSDA) and preliminary results show that proposed framework can achieve 0.5 mAP (mean Average Precision) within 30 min of response-time for unseen concepts.
Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning
TLDR
It is experimentally shown that the image captions trained on Conceptual Captions dataset can be classified more accurately than the features from object detector, though they both are obviously not as rich as the CNN-based features.
Event Recognition with Automatic Album Detection based on Sequential Grouping of Confidence Scores and Neural Attention
  • A. Savchenko
  • Computer Science
    2020 International Joint Conference on Neural Networks (IJCNN)
  • 2020
TLDR
Experimental study with the proposed two-stage event recognition approach demonstrates that the proposed approach is 9-23% more accurate than conventional event recognition on single photos and has 13-16% lower error rate when compared to classification of groups of photos obtained with hierarchical clustering of CNN-based embeddings.
PETA: Photo Albums Event Recognition using Transformers Attention
TLDR
A tailor-made solution, combining the power of CNNs for image representation and transformers for album representation to perform global reasoning on image collection, offering a practical and efficient solution for photo albums event recognition.
Self-Training for Sound Event Detection in Audio Mixtures
TLDR
A self-training technique to leverage unlabeled datasets in supervised learning using pseudo label estimation and a dual-term objective function: a classification loss for the original labels and expectation loss for pseudo labels is proposed.
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
TLDR
This study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self- training and cross-training and explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance.
Sentiment-based sub-event segmentation and key photo selection
Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
TLDR
A pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments and improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.
Ontology-driven Event Type Classification in Images
TLDR
This paper uses a large number of real-world news events to create an ontology based on Wikidata comprising the majority of event types and introduces a novel large-scale dataset that was acquired through Web crawling.
...
1
2
3
4
...

References

SHOWING 1-10 OF 222 REFERENCES
Better Exploiting OS-CNNs for Better Event Recognition in Images
TLDR
This paper addresses the problem of cultural event recognition in still images and focuses on applying deep learning methods on this problem by utilizing the successful architecture of Object-Scene Convolutional Neural Networks (OS-CNNs) to perform event recognition.
Automatic Video Event Detection for Imbalance Data Using Enhanced Ensemble Deep Learning
TLDR
A new ensemble deep learning framework is proposed which is able to handle the over-fitting issue as well as the information losses caused by single models, and alleviates the imbalanced data problem in real-world multimedia data.
DevNet: A Deep Event Network for multimedia event detection and evidence recounting
TLDR
A flexible deep CNN infrastructure, namely Deep Event Network (DevNet), is proposed that simultaneously detects pre-defined events and provides key spatial-temporal evidences, both for event detection and evidence recounting.
Audio-based multimedia event detection using deep recurrent neural networks
TLDR
This paper introduces longer-range temporal information with deep recurrent neural networks (RNNs) for both stages ofimedia event detection, and observes improvements in both frame-level and clip-level performance compared to SVM and feed-forward neural network baselines.
Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection
TLDR
This paper proposes to classify clips for events using "recurrent SVMs", which combine the kernel mapping and the large-margin optimization criterion of SVMs, and the ability to process sequences of variable lengths of RNNs.
A saliency-based approach to event recognition
The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection
TLDR
This paper introduces a bottom-up and top-down approach for reorganization of the ImageNet hierarchy based on all its 21,814 classes and more than 14 million images to deal with the problems of over-specific classes and classes with few images.
Recognize complex events from static images by fusing deep channels
TLDR
Inspired by the recent success of deep learning, a multi-layer framework is formulated to tackle the problem of event recognition, which takes into account both visual appearance and the interactions among humans and objects and combines them via semantic fusion.
Deep Spatial Pyramid Ensemble for Cultural Event Recognition
TLDR
The Deep Spatial Pyramid Ensemble framework is proposed, which employs five deep networks trained on different data sources to extract five corresponding DSP representations for event recognition images and achieves one of the best cultural event recognition performance in this challenge.
Semantic Event Detection Using Ensemble Deep Learning
TLDR
An ensemble deep learning framework is presented, which not only decreases the information loss and over-fitting problems caused by single models, but also overcomes the imbalanced data issue in multimedia big data.
...
1
2
3
4
5
...