Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals

@article{Nawaz2019DeepLS,
  title={Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals},
  author={Shah Nawaz and Muhammad Kamran Janjua and Ignazio Gallo and Arif Mahmood and Alessandro Calefati},
  journal={2019 Digital Image Computing: Techniques and Applications (DICTA)},
  year={2019},
  pages={1-7}
}
We propose a novel deep training algorithm for joint representation of audio and visual information which consists of a single stream network (SSNet) coupled with a novel loss function to learn a shared deep latent space representation of multimodal information. The proposed framework characterizes the shared latent space by leveraging the class centers which helps to eliminate the need of pairwise or triplet supervision. We quantitatively and qualitatively evaluate the proposed approach on… CONTINUE READING

Figures, Tables, and Topics from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 32 REFERENCES

Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching

VIEW 10 EXCERPTS
HIGHLY INFLUENTIAL

VoxCeleb: A Large-Scale Speaker Identification Dataset

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

VoxCeleb2: Deep Speaker Recognition

VIEW 1 EXCERPT