Learning Bimodal Structure in Audio–Visual Data
@article{Monaci2009LearningBS, title={Learning Bimodal Structure in Audio–Visual Data}, author={G. Monaci and P. Vandergheynst and F. Sommer}, journal={IEEE Transactions on Neural Networks}, year={2009}, volume={20}, pages={1898-1910} }
A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dictionaries of bimodal… CONTINUE READING
Figures, Tables, and Topics from this paper
34 Citations
Reverberant speech separation based on audio-visual dictionary learning and binaural cues
- Computer Science
- 2012 IEEE Statistical Signal Processing Workshop (SSP)
- 2012
- 3
- PDF
Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
- Computer Science
- IEEE Transactions on Signal Processing
- 2013
- 24
- Highly Influenced
- PDF
Sequential Audio-Visual Correspondence With Alternating Diffusion Kernels
- Computer Science
- IEEE Transactions on Signal Processing
- 2018
- 2
- PDF
Audio visual speech source separation via improved context dependent association model
- Computer Science
- EURASIP J. Adv. Signal Process.
- 2014
- 2
Lip movement and speech synchronization detection based on multimodal shift-invariant dictionary
- Computer Science
- 2015 IEEE 16th International Conference on Communication Technology (ICCT)
- 2015
Use of bimodal coherence to resolve the permutation problem in convolutive BSS
- Computer Science
- Signal Process.
- 2012
- 14
- PDF
Robust front-end for audio, visual and audio–visual speech classification
- Computer Science
- Int. J. Speech Technol.
- 2018
- 2
Audiovisual Speech Source Separation: An overview of key methodologies
- Computer Science
- IEEE Signal Processing Magazine
- 2014
- 50
- PDF
Audio-visual localization with hierarchical topographic maps: Modeling the superior colliculus
- Computer Science
- Neurocomputing
- 2012
- 7
- PDF
References
SHOWING 1-10 OF 72 REFERENCES
Audiovisual Gestalts
- Computer Science
- 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06)
- 2006
- 25
- PDF
Analysis of multimodal sequences using geometric video representations
- Computer Science
- Signal Process.
- 2006
- 30
- PDF
Video assisted speech source separation
- Computer Science
- Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
- 2005
- 63
- PDF
Noisy audio feature enhancement using audio-visual speech data
- Computer Science
- 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
- 2002
- 43
- PDF
Visual voice activity detection as a help for speech source separation from convolutive mixtures
- Computer Science
- Speech Commun.
- 2007
- 46
- PDF
Sparse and shift-Invariant representations of music
- Computer Science
- IEEE Transactions on Audio, Speech, and Language Processing
- 2006
- 121
- PDF
Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures
- Computer Science
- IEEE Transactions on Audio, Speech, and Language Processing
- 2007
- 88
- PDF
Multimodal speaker localization in a probabilistic framework
- Computer Science
- 2006 14th European Signal Processing Conference
- 2006
- 26
- PDF