Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics


We construct multi-modal concept representations by concatenating a skip-gram linguistic representation vector with a visual concept representation vector computed using the feature extraction layers of a deep convolutional neural network (CNN) trained on a large labeled object recognition dataset. This transfer learning approach brings a clear performance… (More)

6 Figures and Tables



Citations per Year

106 Citations

Semantic Scholar estimates that this publication has 106 citations based on the available data.

See our FAQ for additional information.

  • Presentations referencing similar topics