Speech perception without traditional speech cues.

@article{Remez1981SpeechPW,
  title={Speech perception without traditional speech cues.},
  author={R E Remez and Philip Rubin and David B. Pisoni and Thomas D. Carrell},
  journal={Science},
  year={1981},
  volume={212 4497},
  pages={
          947-9
        }
}
A three-tone sinusoidal replica of a naturally produced utterance was identified by listeners, despite the readily apparent unnatural speech quality of the signal. The time-varying properties of these highly artificial acoustic signals are apparently sufficient to support perception of the linguistic message in the absence of traditional acoustic cues for phonetic segments. 
The stream of speech.
The use of sinusoidal replicas of speech signals reveals that listeners can perceive speech solely from temporally coherent spectral variation of nonspeech acoustic elements. This sensitivity to
ARGUMENTS FOR A NONSEGMENTAL VIEW OF SPEECH PERCEPTION
TLDR
This argument is that the perceptual system maps an informationallyrich signal directly onto lexical forms that are structurally rich, and that phonemes are not essential for lexical access.
Speech Recognition with Primarily Temporal Cues
TLDR
Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information; the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
CODING OF THE SPEECH SPECTRUM IN THREE TIME‐VARYING SINUSOIDS a
TLDR
It is suggested that phonetic perception may depend on properties of coherent spectrum variation, a second-order property of the acoustic signal, rather than any particular set of acoustic elements present in speech signals.
Perceptual normalization of vowels produced by sinusoidal voices.
TLDR
The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.
Intelligibility and perceived quality of spectrally-impoverished speech sounds
Ten listeners were asked to identify and rate the quality of speech sounds processed in such a way that only the most energetic spectral components were retained. The influence of both the number of
Audio-visual speech perception without speech cues
TLDR
Intelligibility of the present set of sine wave sentences was relatively low in contrast to previous findings, however, intelligibility was greatly increased when visual information from a talker's face was presented along with the auditory stimuli.
Audiovisual integration in speech perception: a multi-stage process
TLDR
This work investigates whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception and whether audiovISual integration is undertaken in a single processing stage or multiple processing stages.
The intelligibility of pointillistic speech.
A form of processed speech is described that is highly discriminable in a closed-set identification format. The processing renders speech into a set of sinusoidal pulses played synchronously across
Speech-specificity of two audiovisual integration effects
TLDR
It is shown that audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers often fail to perceive as speech.
...
1
2
3
4
5
...

References

SHOWING 1-5 OF 5 REFERENCES
Two left-hemisphere mechanisms in speech perception
Right-ear advantages of different magnitudes occur systematically in dichotic listening for different phoneme classes and for certain phonemes according to their syllabic position. Such differences
Stop consonant place perception with single-formant stimuli: evidence for the role of the front-cavity resonance.
  • G. Kuhn
  • Physics
    The Journal of the Acoustical Society of America
  • 1979
The third formant and the second formant were found on average to cue the place of articulation of intervocalic stop consonants equally well when the stop consonants occurred before the vowel/i/.
Invariant cues for place of articulation in stop consonants.
TLDR
It was determined that the gross shape of the spectrum sampled at the consonantal release showed a distinctive shape for each place of articulation: a prominent midfrequency spectral peak for velars, a diffuse-rising spectrum for alveolars, and an diffuse-falling spectrum for labials.
Speech cues and sign stimuli.
DOCUMENT REE.UME