• Corpus ID: 235166820

Recent Advances and Trends in Multimodal Deep Learning: A Review

  title={Recent Advances and Trends in Multimodal Deep Learning: A Review},
  author={Jabeen Summaira and Xi Li and Amin Muhammad Shoib and Songyuan Li and Jabbar Abdul},
Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on… 
DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data
—Gravitational lensing is the relativistic effect gener- ated by massive bodies, which bend the space-time surrounding them. It is a deeply investigated topic in astrophysics and allows validating


Multimodal Representation Learning: Advances, Trends and Challenges
An overview of deep multimodal learning, especially the approaches proposed within the last decades, is presented to provide potential readers with advances, trends and challenges, which can be very helpful to researchers in the field of machine.
Deep Multimodal Learning: A Survey on Recent Advances and Trends
This work first classify deep multimodal learning architectures and then discusses methods to fuse learned multi-modal representations in deep-learning architectures.
Deep Multimodal Representation Learning: A Survey
The key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of the knowledge, have never been reviewed previously are highlighted.
A Survey on Deep Learning for Multimodal Data Fusion
This review presents a survey on deep learning for multimodal data fusion to provide readers, regardless of their original community, with the fundamentals of multi-modality deep learning fusion method and to motivate new multimodAL data fusion techniques of deep learning.
Multimodal Machine Learning: A Survey and Taxonomy
This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Deep Spatio-Temporal Features for Multimodal Emotion Recognition
A novel approach using 3-dimensional convolutional neural networks (C3Ds) to model the spatio-temporal information, cascaded with multimodal deep-belief networks (DBNs) that can represent the audio and video streams is introduced.
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
This survey focuses on ten prominent tasks that integrate language and vision by discussing their problem formulations, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods.
Exploring CNN-Based Architectures for Multimodal Salient Event Detection in Videos
Comparisons over the COGNIMUSE database, consisting of movies and travel documentaries, provided strong evidence that the CNN-based approach for all modalities, even in this task, manages to outperform the hand-crafted frontend in almost all cases, accomplishing really good average results.
Mutlimodal Learning with Deep Boltzmann Machine for Emotion Prediction in User Generated Videos
This work proposes to learn a joint density model over the space of multi-modal inputs (including visual, auditory and textual modalities) with Deep Boltzmann Machine (DBM) with the aim of discovering the highly non-linear relationships that exist between low-level features across different modalities.