Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking

  title={Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking},
  author={Jaco Vermaak and Michel Gangnet and Andrew Blake and Patrick P{\'e}rez},
Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker, while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where… CONTINUE READING
Highly Influential
This paper has highly influenced 11 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 140 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 95 extracted citations

Audiovisual Tracking Using STAC Sensors

2007 First ACM/IEEE International Conference on Distributed Smart Cameras • 2007
View 6 Excerpts
Highly Influenced

Multi-modal fusion with particle filter for speaker localization and tracking

2011 International Conference on Multimedia Technology • 2011
View 4 Excerpts
Highly Influenced

Target Detection and Tracking With Heterogeneous Sensors

IEEE Journal of Selected Topics in Signal Processing • 2008
View 6 Excerpts
Highly Influenced

A Bayesian 3D People Tracker using Multiple Cameras and a Microphone Array

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 • 2007
View 4 Excerpts
Highly Influenced

140 Citations

Citations per Year
Semantic Scholar estimates that this publication has 140 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 12 references

A Probabilistic Exclusion Principle for Tracking Multiple Objects

International Journal of Computer Vision • 1999
View 2 Excerpts

Time-delay estimation of reverberated speech exploiting harmonic structure.

The Journal of the Acoustical Society of America • 1999
View 1 Excerpt

Active Contours

Springer London • 1998
View 1 Excerpt

A two-stage algorithm for determining talker location from linear microphone array data

H. F. Silverman, E. Kirtman
Computer Speech and Language, 6:129–152 • 1992
View 1 Excerpt

Similar Papers

Loading similar papers…