Sayeh Mirzaei

  • Citations Per Year
Learn More
In this paper, a novel approach is proposed for estimating the number of sources and for source separation in convolutive audio stereo mixtures. First, an angular spectrum-based method is applied to count and locate the sources. A nonlinear GCC-PHAT metric is exploited for this purpose. The estimated channel coefficients are then utilized to obtain a(More)
Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to(More)
In earlier work, we have shown that vocabulary discovery from spoken utterances and subsequent recognition of the acquired vocabulary can be achieved through Non-negative Matrix Factorization (NMF). An open issue for this task is to determine automatically how many different word representations should be included in the model. In this paper, Bayesian NMF(More)
In this paper, we address the task of audio source separation for a stereo reverberant mixture of audio signals. We use a full-rank model for the spatial covariance matrix. Bayesian Non-negative Matrix Factorization(NMF)frameworks are introduced for factorizing the time-frequency variance matrix of each source into basis components and time activations. We(More)
In this paper, the tasks of speech source localization, source counting and source separation are addressed for an unknown number of sources in a stereo recording scenario. In the first stage, the angles of arrival of individual source signals are estimated through a peak finding scheme applied to the angular spectrum which has been derived using non-linear(More)
In earlier work, we have formulated word discovery from speech as a latent component analysis problem. In more recent work, we proposed a Bayesian approach for estimating the model order, i.e. the vocabulary size, by evaluation of the marginal likelihood for different order values. That technique was expensive since the algorithm should be repeated for(More)
  • 1