Learn More
Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to(More)
In this paper, a novel approach is proposed for estimating the number of sources and for source separation in convolutive audio stereo mixtures. First, an angular spectrum-based method is applied to count and locate the sources. A nonlinear GCC-PHAT metric is exploited for this purpose. The estimated channel coefficients are then utilized to obtain a(More)
In earlier work, we have formulated word discovery from speech as a latent component analysis problem. In more recent work, we proposed a Bayesian approach for estimating the model order, i.e. the vocabulary size, by evaluation of the marginal likelihood for different order values. That technique was expensive since the algorithm should be repeated for(More)
  • 1