Learning Bimodal Structure in Audio–Visual Data

@article{Monaci2009LearningBS,
  title={Learning Bimodal Structure in Audio–Visual Data},
  author={G. Monaci and P. Vandergheynst and F. Sommer},
  journal={IEEE Transactions on Neural Networks},
  year={2009},
  volume={20},
  pages={1898-1910}
}
  • G. Monaci, P. Vandergheynst, F. Sommer
  • Published 2009
  • Computer Science, Medicine
  • IEEE Transactions on Neural Networks
  • A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dictionaries of bimodal… CONTINUE READING
    34 Citations
    Reverberant speech separation based on audio-visual dictionary learning and binaural cues
    • 3
    • PDF
    Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    • 24
    • Highly Influenced
    • PDF
    Sequential Audio-Visual Correspondence With Alternating Diffusion Kernels
    • 2
    • PDF
    Audio visual speech source separation via improved context dependent association model
    • 2
    Lip movement and speech synchronization detection based on multimodal shift-invariant dictionary
    Audiovisual Speech Source Separation
    • 2
    • Highly Influenced
    Use of bimodal coherence to resolve the permutation problem in convolutive BSS
    • 14
    • PDF
    Audiovisual Speech Source Separation: An overview of key methodologies
    • 50
    • PDF

    References

    SHOWING 1-10 OF 72 REFERENCES
    Audiovisual Gestalts
    • G. Monaci, P. Vandergheynst
    • Computer Science
    • 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06)
    • 2006
    • 25
    • PDF
    Audio Vision: Using Audio-Visual Synchrony to Locate Sounds
    • 232
    • PDF
    Video assisted speech source separation
    • 63
    • PDF
    Noisy audio feature enhancement using audio-visual speech data
    • 43
    • PDF
    Sparse and shift-Invariant representations of music
    • 121
    • PDF
    Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures
    • 88
    • PDF
    Multimodal speaker localization in a probabilistic framework
    • 26
    • PDF