Close Shadowing Natural Versus Synthetic Speech

@article{Bailly2003CloseSN,
  title={Close Shadowing Natural Versus Synthetic Speech},
  author={G{\'e}rard Bailly},
  journal={International Journal of Speech Technology},
  year={2003},
  volume={6},
  pages={11-19}
}
  • G. Bailly
  • Published 2003
  • Psychology
  • International Journal of Speech Technology
Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems. 
The Effect of Simultaneous Text on the Recall of Noise-Degraded Speech
  • I. Grossman, R. Rajan
  • Psychology
    Journal of experimental psychology. Human perception and performance
  • 2017
TLDR
It is uniquely demonstrate that congruent text benefits the recall of noise-degraded speech.
Understanding the mechanisms underlying voluntary responses to pitch-shifted auditory feedback.
TLDR
Results showed that voluntary responses that followed the stimulus directions had significantly shorter latencies than opposing responses, and the slower opposing responses may represent a control strategy that requires monitoring and correcting for errors between the feedback signal and the intended vocal goal.
The combined effect of speech codec quality and transmission delay on human performance during complex spoken interactions
TLDR
Results suggest that, for highly complex interactions which involve significant cognitive load, human performance will degrade more rapidly with increases in delay for transmission systems using speech codecs with lower quality output.
Unintended imitation in nonword repetition
A truly human interface: interacting face-to-face with someone whose words are determined by a computer program
TLDR
This work uses speech shadowing to create situations wherein people converse in person with a human whose words are determined by a conversational agent computer program and reports three studies that investigated people’s experiences interacting with echoborgs and the extent to which echOBorgs pass as autonomous humans.
...
1
2
3
4
...

References

SHOWING 1-10 OF 40 REFERENCES
Linguistic Structure and Speech Shadowing at Very Short Latencies
TLDR
This paper presents an experimental task in which the subject is required to repeat (shadow) speech as he hears it, and the response latency to each word of a sentence is measured.
Rapid reproduction of vowel-vowel sequences: evidence for a fast and direct acoustic-motoric linkage in speech.
TLDR
Listeners display extremely short latencies when asked to reproduce (shadow), a random series of vowel-vowel sequences, which suggests a fast and direct linkage of the speech-analysis and speech-production mechanisms.
Speech shadowing and speech comprehension
Speech-production measures of speech perception: rapid shadowing of VCV syllables.
TLDR
Five listeners rapidly repeated a random presentation of the vowel-consonant vowels /aba, apa, ama, aka, aga, to suggest that speech-perception decisions in shadowing are directly available to, and are perhaps made to occur at a point comparable to the consonantal release seen for the simple /aba/ responses.
How flexible is the human voice? - a case study of mimicry
TLDR
A professional impersonation artist imitated three well-known Swedish public figures and it was found that he was able to mimic global speech rate very closely, but timing at the segmental level showed little or no change in the direction of the targets.
The bimodal perception of speech in infancy.
TLDR
Both the ability to detect auditory-visual correspondences and the tendency to imitate may reflect the infant's knowledge of the relationship between audition and articulation.
A human sound transducer/reproducer: Temporal capabilities of a profoundly echolalic child
Verbal retention after shadowing and after listening
Ss shadowed or listened to stories that had been recorded at 1 word/sec (wps), 2 wps, and 3 wps. They then took tests of word recognition, semantic retention, and syntax recognition. At the slowest
Using Prosody to Predict the End of Sentences in English and French: Normal and Brain-damaged Subjects
In an earlier study (Grosjean, 1983), it was found that listeners of English were surprisingly accurate at predicting the temporal end of a sentence when only given the part up to the “potentially
The frame/content theory of evolution of speech production
TLDR
The new role of Broca's area and its surround in human vocal communication may have derived from its evolutionary history as the main cortical center for the control of ingestive processes.
...
1
2
3
4
...