The Relationship between Speech Perception and Auditory Organisation: Studies with Spectrally Reduced Speech


Listeners are remarkably adept at recognising speech that has undergone extensive spectral reduction. Natural speech can be reproduced using as few as three time-varying sinusoids mimicking the corresponding speech formants. Untrained listeners are able to transcribe this `sine-wave' speech with a high degree of reliability. Phonetic percepts generated by sine-wave speech occur despite an apparent lack of the cues on which low level grouping processes are believed to operate. Consequently, it has been proposed that speech perception is governed by processes operating independently of those described by auditory scene analysis. This thesis examines the auditory scene analysis account in relation to sine-wave speech perception through a mixture of perceptual and computational studies. A re-examination is made of evidence provided by previous perceptual studies of sine-wave speech that the application of a simple grouping cue may increase the intelligibility of sine-wave speech. New evidence is presented from a perceptual study employing stimuli constructed from simultaneous sinewave speech sources. This study demonstrates that in conditions that are closer to those of everyday listening, grouping cues have an important role in the formation of coherent speech percepts. In conjunction with these perceptual studies, results from automatic segregation and recognition tasks suggest that sine-wave speech contains su cient low level, non-speech-speci c structure to allow partial descriptions of sine-wave sources to be recovered from two source mixtures. It is argued that these partial descriptions are su cient to support the limited intelligibility observed in two-source sinewave speech listening tests. It is shown that the recognition of sine-wave speech may proceed directly from natural speech models if a peak-based representation and missing-data recognition strategy are employed. These techniques are also shown to suitable for the recognition of natural speech in noisy conditions. In conclusion, it is considered that because sine-wave speech possesses residual primitive structure and may allow the action of schema-driven organisation, then its perception may be accommodated within the auditory scene analysis account. Acknowledgments First I must thank my supervisor, Martin Cooke. His help and advice are ultimately responsible for this thesis. He has been a continuous source of inspiration and encouragement throughout my PhD. Thanks are also due to my friends and colleagues in the Department of Computer Science at the University of She eld. In particular, thanks to all those past and present members of the Speech and Hearing Research Group for their advice and friendship over the years. Special thanks to Dave for every hour I have spent moaning over a pint of beer and to the Pitsmoor SpandHers Jeremy, Xstal, Andy and the cat. Thanks also to the people whose software has facilitated this work. The formant estimator designed by Alan Crowe of CSTR, Edinburgh enabled us to produce sine-wave speech in the quantities needed. HTK Version 1.5 was used extensively for the ASR experiments. The listening experiments would have been impossible if not for the generosity of the friends, Romans and members of SpandH who lent me their ears. This research was supported by a She eld University Boucher-Roberts Award. And later by a sizeable loan from my parents! `Tak!' to Paul Dalsgaard and the friendly Danes at CPK, Aalborg University who provided me with a pleasant environment in which to write-up the bulk of this thesis. Thanks to Heidi for making sure that I nished it. Thanks Sash for the proof-reading. And nally...To my family! Thank God its over, eh?

12 Figures and Tables

Cite this paper

@inproceedings{Barker2008TheRB, title={The Relationship between Speech Perception and Auditory Organisation: Studies with Spectrally Reduced Speech}, author={Jon Barker}, year={2008} }