HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

  title={HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities},
  author={Devanshu Arya and Stevan Rudinac and Marcel Worring},
  journal={Proceedings of the 27th ACM International Conference on Multimedia},
Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional… Expand
COBRA: Contrastive Bi-Modal Representation Learning
There are a wide range of applications that involve multi-modal data, such as cross-modal retrieval, visual question-answering and image captioning. Such applications are primarily dependent onExpand
COBRA: Contrastive Bi-Modal Representation Algorithm
A novel framework COBRA is presented that aims to train two modalities in a joint fashion inspired by the Contrastive Predictive Coding and Noise Contrastive Estimation paradigms which preserve both inter and intra-class relationships and reduces the modality gap significantly. Expand
HyperSAGE: Generalizing Inductive Representation Learning on Hypergraphs
HyperSAGE, a novel hypergraph learning framework that uses a two-level neural message passing strategy to accurately and efficiently propagate information through hypergraphs, is presented and it is demonstrated that the higher expressive power of HyperSAGE makes it more stable in learning node representations as compared to the alternatives. Expand
Adaptive Neural Message Passing for Inductive Learning on Hypergraphs
HyperMSG is a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges and outperforms state-of-the-art hyper graph learning methods on a wide range of tasks and datasets. Expand
Semantic Path-Based Learning for Review Volume Prediction
This work uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes to demonstrate the rich expressiveness of such representations in predicting review volume. Expand
Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings
It is demonstrated that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Expand
Visual Analytics for Temporal Hypergraph Model Exploration
The proposed Hyper-Matrix technique paves the way for the visual analytics of temporal hypergraphs in a wide variety of domains and surpasses existing solutions in terms of scalability and applicability, enables the incorporation of domain knowledge, and allows for fast search-space traversal. Expand


Heterogeneous Network Embedding via Deep Architectures
It is demonstrated that the rich content and linkage information in a heterogeneous network can be captured by a multi-resolution deep embedding function, so that similarities among cross-modal data can be measured directly in a common embedding space. Expand
Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder
A novel deep embedding method, i.e., Attention-based Multi-view Variational Auto-Encoder (AMVAE), to incorporate both the link information and the multimodal contents for more effective and efficient embedding. Expand
Variation Autoencoder Based Network Representation Learning for Classification
A deep network representation model is introduced that seamlessly integrates the text information and structure of a network by exploiting the variational autoencoder (VAE), which is a deep unsupervised generation algorithm. Expand
Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs
It is claimed that representing social networks using hypergraphs improves the task of predicting missing information about an entity by capturing higher-order relations. Expand
Multimodal learning with deep Boltzmann machines
A Deep Boltzmann Machine is proposed for learning a generative model of multimodal data and it is shown that the model can be used to create fused representations by combining features across modalities, which are useful for classification and information retrieval. Expand
Videos as Space-Time Region Graphs
The proposed graph representation achieves state-of-the-art results on the Charades and Something-Something datasets and obtains a huge gain when the model is applied in complex environments. Expand
A new approach to cross-modal multimedia retrieval
It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task. Expand
Deep Collaborative Embedding for Social Image Understanding
A Deep Collaborative Embedding model is proposed to uncover a unified latent space for images and tags and integrates the weakly-supervised image-tag correlation, image correlation and tag correlation simultaneously and seamlessly to collaboratively explore the rich context information of social images. Expand
Socializing the Semantic Gap
A two-dimensional taxonomy is introduced to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. Expand
OmniArt: Multi-task Deep Learning for Artistic Data Analysis
An efficient and accurate method for multi-task learning with a shared representation applied in the artistic domain and a challenge like nature to the new aggregated data set with almost half a million samples and structured meta-data to encourage further research and societal engagement. Expand