Multi-Modal Deep Analysis for Multimedia
@article{Zhu2019MultiModalDA, title={Multi-Modal Deep Analysis for Multimedia}, author={Wenwu Zhu and Xin Wang and Hongzhi Li}, journal={IEEE Transactions on Circuits and Systems for Video Technology}, year={2019}, volume={30}, pages={3740-3764} }
With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for…
Figures from this paper
11 Citations
On the Fusion of Multiple Audio Representations for Music Genre Classification
- Computer ScienceAnais do XVIII Simpósio Brasileiro de Computação Musical (SBCM 2021)
- 2021
This work showed an exploratory study on different neural network model fusion techniques for music genre classification with multiple features as input and demonstrated that Multi-Feature Fusion Networks consistently improve the classification accuracy for suitable choices of input representations.
Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2022
This work attempts to effectively aggregate the discriminative information of actions and interactions from both RGB videos and skeleton sequences by attentively fusing multi-modal features by implementing a novel Expansion-Squeeze-Excitation Fusion Network (ESE-FN).
Exploring the Benefits of Cross-Modal Coding
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2022
Numerical results demonstrate that the proposed cross-modal coding can achieve significant benefits relative to the existing schemes, especially when multi- modal signals have strong semantic correlation.
A Variational Inference Method for Few-Shot Learning
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2023
A novel two-generation based Latent Feature Augmentation and Distribution Regularization framework (LFADR) including prior relation net (PRN) and vae-based posterior relationNet (VPORN) to generate a more robust VPORN based on PRN by transferring the prior knowledge in FSL is proposed.
MDFNet: application of multimodal fusion method based on skin image and clinical data to skin cancer classification
- Medicine, Computer ScienceJournal of Cancer Research and Clinical Oncology
- 2022
MDFNet can not only be applied as an effective auxiliary diagnostic tool for skin cancer diagnosis, help physicians improve clinical decision-making ability and effectively improve the efficiency of clinical medicine diagnosis, but also its proposed data fusion method fully exerts the advantage of information convergence and has a certain reference value for the intelligent diagnosis of numerous clinical diseases.
Video Grounding and Its Generalization
- Computer ScienceProceedings of the 30th ACM International Conference on Multimedia
- 2022
This tutorial will give a detailed introduction about the development and evolution of this task, point out the limitations of existing benchmarks, and extend such a text-based grounding task to more general scenarios, especially how it guides the learning of other video-language tasks like video question answering based on event grounding.
Decision Fusion of Two Sensors Object Classification Based on the Evidential Reasoning Rule
- Expert Systems with Applications
- 2022
Implementation of Short Video Click-Through Rate Estimation Model Based on Cross-Media Collaborative Filtering Neural Network
- Computer ScienceComputational intelligence and neuroscience
- 2022
By directly extracting the image features, behavioral features, and audio features of short videos as video feature representation, more video information is considered than other models and the proposed model improves in AUC, accuracy, and log loss metrics.
A Comprehensive Report on Machine Learning-based Early Detection of Alzheimer's Disease using Multi-modal Neuroimaging Data
- Computer ScienceACM Comput. Surv.
- 2022
A variety of feature selection, scaling, and fusion methodologies along with confronted challenges are elaborated for designing an ML-based AD diagnosis system based on multi-modal neuroimaging data from patients with AD.
References
SHOWING 1-10 OF 168 REFERENCES
Cross-Domain Collaborative Learning in Social Multimedia
- Computer ScienceACM Multimedia
- 2015
This work proposes a generic Cross-Domain Collaborative Learning (CDCL) framework based on non-parametric Bayesian dictionary learning model for cross-domain data analysis that can effectively explore the virtues of different information sources to complement and enhance each other forCross- domain data analysis.
Multimodal video classification with stacked contractive autoencoders
- Computer ScienceSignal Process.
- 2016
Combining modality specific deep neural networks for emotion recognition in video
- Computer ScienceICMI '13
- 2013
In this paper we present the techniques used for the University of Montréal's team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions…
Cross-Platform Multi-Modal Topic Modeling for Personalized Inter-Platform Recommendation
- Computer ScienceIEEE Transactions on Multimedia
- 2015
Qualitative and quantitative evaluation results validate the effectiveness of the proposed cross- platform multi-modal topic model (CM3TM) and demonstrate the advantage of connecting different platforms with different modalities for the inter-platform recommendation.
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi- modal features, which results in superior performance for VQA compared with other bilinear pooling approaches.
Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between high-light and non-highlight video segments is proposed and achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy.
Video Summarization by Learning Deep Side Semantic Embedding
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2019
A novel deep side semantic embedding (DSSE) model is presented to generate video summaries by leveraging the freely available side information and the superior performance of DSSE is demonstrated to the several state-of-the-art approaches to video summarization.
Topic driven multimodal similarity learning with multi-view voted convolutional features
- Computer SciencePattern Recognit.
- 2018
Multimodal Deep Learning
- Computer ScienceICML
- 2011
This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.
Multimodal fusion using dynamic hybrid models
- Computer ScienceIEEE Winter Conference on Applications of Computer Vision
- 2014
A novel hybrid model is proposed that exploits the strength of discriminative classifiers along with the representational power of generative models to solve the challenge of detecting multimodal events in time varying sequences.