The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear

Abstract

The Audio-Visual Face Cover Corpus consists of high-quality audio and video recordings of 10 native British English speakers wearing different types of ‘facewear’. Speakers read aloud a set of 64 /C1VC2/ syllables embedded in a carrier phrase. 18 English consonants occurred twice each in onset and coda positions. Speakers recited the list 1+8 times, i.e. once in control condition (no facewear) and eight times while wearing a forensicallyrelevant face covering. Audio recordings were made by simultaneously capturing the speech via a headband microphone and two shotgun microphones placed facing and behind the speaker. Footage of the subject’s head and shoulders was filmed from two camera angles, frontal and half-profile. In total, 6,120 utterances were recorded per device. This paper aims to specify the database design, to introduce forensic-phonetic research utilising the data, and to demonstrate the corpus’s potential applications in related fields of study and in casework conducted by forensic speech scientists.

2 Figures and Tables

Cite this paper

@inproceedings{Fecher2012TheF, title={The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear}, author={Natalie Fecher}, booktitle={INTERSPEECH}, year={2012} }