Multimodal Deep Learning

  title={Multimodal Deep Learning},
  author={Jiquan Ngiam and Aditya Khosla and Mingyu Kim and Juhan Nam and Honglak Lee and Andrew Y. Ng},
Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train a deep network that learns features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be… CONTINUE READING
Highly Influential
This paper has highly influenced 119 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 1,530 citations. REVIEW CITATIONS
877 Citations
28 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 877 extracted citations

1,530 Citations

Citations per Year
Semantic Scholar estimates that this publication has 1,530 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 28 references

and Goudie Marshall

  • W. Fisher, G. Doddington
  • The DARPA speech recognition research database…
  • 1986
Highly Influential
3 Excerpts

Similar Papers

Loading similar papers…