Robust indoor speaker recognition in a network of audio and video sensors

@article{DArca2016RobustIS,
  title={Robust indoor speaker recognition in a network of audio and video sensors},
  author={Eleonora D'Arca and Neil M. Robertson and James R. Hopgood},
  journal={Signal Processing},
  year={2016},
  volume={129},
  pages={137-149}
}
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited… CONTINUE READING