Unsupervised detection of multimodal clusters in edited recordings

@article{Dielmann2010UnsupervisedDO,
  title={Unsupervised detection of multimodal clusters in edited recordings},
  author={Alfred Dielmann},
  journal={2010 IEEE International Workshop on Multimedia Signal Processing},
  year={2010},
  pages={177-182}
}
Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-8 OF 8 CITATIONS

Unsupervised Mining of Multiple Audiovisually Consistent Clusters for Video Structure Analysis

  • 2012 IEEE International Conference on Multimedia and Expo
  • 2012
VIEW 8 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

A conditional random field approach for audio-visual people diarization

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 1 EXCERPT
CITES BACKGROUND

A Multimodal Approach to Speaker Diarization on TV Talk-Shows

  • IEEE Transactions on Multimedia
  • 2013
VIEW 1 EXCERPT
CITES BACKGROUND

Face recognition using Co-occurrence Histograms of Oriented Gradients

  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
VIEW 1 EXCERPT
CITES METHODS

References

Publications referenced by this paper.
SHOWING 1-10 OF 22 REFERENCES

Canal9: A database of political debates for analysis of social interactions

  • 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops
  • 2009
VIEW 3 EXCERPTS

Multimodal Speaker Diarization

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2012
VIEW 1 EXCERPT

Association of Audio and Video Segmentations for Automatic Person Indexing

  • 2007 International Workshop on Content-Based Multimedia Indexing
  • 2007
VIEW 1 EXCERPT

An overview of automatic speaker diarization systems

  • IEEE Transactions on Audio, Speech, and Language Processing
  • 2006
VIEW 1 EXCERPT

Detection of TV news monologues by style analysis

  • 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763)
  • 2004
VIEW 1 EXCERPT