Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals
@article{Nawaz2019DeepLS, title={Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals}, author={Shah Nawaz and Muhammad Kamran Janjua and I. Gallo and A. Mahmood and Alessandro Calefati}, journal={2019 Digital Image Computing: Techniques and Applications (DICTA)}, year={2019}, pages={1-7} }
We propose a novel deep training algorithm for joint representation of audio and visual information which consists of a single stream network (SSNet) coupled with a novel loss function to learn a shared deep latent space representation of multimodal information. The proposed framework characterizes the shared latent space by leveraging the class centers which helps to eliminate the need of pairwise or triplet supervision. We quantitatively and qualitatively evaluate the proposed approach on… CONTINUE READING
Figures, Tables, and Topics from this paper
3 Citations
Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network
- Computer Science, Engineering
- INTERSPEECH
- 2020
- 3
- PDF
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
- Computer Science
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 32 REFERENCES
3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition
- Computer Science
- IEEE Access
- 2017
- 56
- PDF
Cross-Modal Scene Networks
- Computer Science, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2018
- 68
- PDF
Learnable PINs: Cross-Modal Embeddings for Person Identity
- Computer Science
- ECCV
- 2018
- 48
- Highly Influential
- PDF
A Discriminative Feature Learning Approach for Deep Face Recognition
- Computer Science
- ECCV
- 2016
- 1,785
- PDF
Deep Neural Network Embeddings for Text-Independent Speaker Verification
- Computer Science
- INTERSPEECH
- 2017
- 380
- PDF
Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching
- Computer Science
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
- 97
- Highly Influential
- PDF
Look, Listen and Learn
- Computer Science
- 2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
- 297
- PDF