Corpus ID: 236428182

Will Multi-modal Data Improves Few-shot Learning?

@article{Zhang2021WillMD,
  title={Will Multi-modal Data Improves Few-shot Learning?},
  author={Zilu Zhang and Shihao Ma and Yichun Zhang},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.11853}
}
Most few-shot learning models utilize only one modality of data. We would like to investigate qualitatively and quantitatively how much will the model improve if we add an extra modality (i.e. text description of the image), and how it affects the learning procedure. To achieve this goal, we propose four types of fusion method to combine the image feature and text feature. To verify the effectiveness of improvement, we test the fusion methods with two classical few-shot learning models ProtoNet… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 59 REFERENCES
Discriminative Hallucination for Multi-Modal Few-Shot Learning
TLDR
This paper developed a two-stage framework built upon the idea of cross-modal data hallucination, and introduced a method for few-shot fine-grained recognition, utilizing textual descriptions of the visual data. Expand
Multimodal Prototypical Networks for Few-shot Learning
TLDR
A generative model is trained that maps text data into the visual feature space to obtain more reliable prototypes and shows that in such cases nearest neighbor classification is a viable approach and outperform state-of-the-art single-modal and multimodal few-shot learning methods on the CUB-200 and Oxford-102 datasets. Expand
Few-Shot Learning with Metric-Agnostic Conditional Embeddings
TLDR
This work introduces a novel architecture where class representations are conditioned for each few-shot trial based on a target image, and deviates from traditional metric-learning approaches by training a network to perform comparisons between classes rather than relying on a static metric comparison. Expand
Learning to Compare: Relation Network for Few-Shot Learning
TLDR
A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning. Expand
A Closer Look at Few-shot Classification
TLDR
The results reveal that reducing intra-class variation is an important factor when the feature backbone is shallow, but not as critical when using deeper backbones, and a baseline method with a standard fine-tuning practice compares favorably against other state-of-the-art few-shot learning algorithms. Expand
Few-Shot Learning with Graph Neural Networks
TLDR
A graph neural network architecture is defined that generalizes several of the recently proposed few-shot learning models and provides improved numerical performance, and is easily extended to variants of few- shot learning, such as semi-supervised or active learning, demonstrating the ability of graph-based models to operate well on 'relational' tasks. Expand
Learning Deep Representations of Fine-Grained Visual Descriptions
TLDR
This model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero- shot classification on the Caltech-UCSD Birds 200-2011 dataset. Expand
Matching Networks for One Shot Learning
TLDR
This work employs ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories to learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. Expand
Meta-Learning With Differentiable Convex Optimization
TLDR
The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. Expand
CentralNet: a Multilayer Approach for Multimodal Fusion
TLDR
This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media by introducing a central network linking the modality specific networks. Expand
...
1
2
3
4
5
...