Speech synthesis from neural decoding of spoken sentences

  title={Speech synthesis from neural decoding of spoken sentences},
  author={Gopala Krishna Anumanchipalli and Josh Chartier and Edward F. Chang},
Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. [] Key Method Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory…

Figures and Tables from this paper

Real-time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity
The real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis, as well as the development of techniques that incorporate co-adaptation of the user and system for optimized performance.
Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices
It is shown that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues and employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation.
Imagined speech can be decoded from low- and cross-frequency features in perceptual space
It is demonstrated that low-frequency power and cross-frequency dynamics contain key information for imagined speech decoding, and that exploring perceptual spaces offers a promising avenue for future imagined speech BCIs.
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features
It is demonstrated using human intracranial recordings that both low- and higher-frequency power and local cross-frequency contribute to imagined speech decoding, in particular in phonetic and vocalic spaces.
Speech imagery decoding as a window to speech planning and production
Speech imagery (the ability to generate internally quasi-perceptual experiences of speech events) is a fundamental ability tightly linked to important cognitive functions such as inner speech,
Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus.
The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.
Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals
This study investigated the decoding of five imagined and spoken phrases from single-trial, non-invasive magnetoencephalography (MEG) signals collected from eight adult subjects and found convolutional neural networks applied on the spatial, spectral and temporal features extracted from the MEG signals to be highly effective.
Speech Synthesis from Stereotactic EEG using an Electrode Shaft Dependent Multi-Input Convolutional Neural Network Approach
A previously presented decoding pipeline for speech synthesis based on ECoG signals to implanted depth electrodes (sEEG) is adapted by adapting a multi-input convolutional neural network that extracts speech-related activity separately for each electrode shaft and estimates spectral coefficients to reconstruct an audible waveform.
Decoding Speech Evoked Jaw Motion from Non-invasive Neuromagnetic Oscillations
Experimental results indicated that the jaw kinematics can be successfully decoded from non-invasive neural (MEG) signals.
Decoding spoken English phonemes from intracortical electrode arrays in dorsal precentral gyrus
The ability to decode a comprehensive set of phonemes using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.


Brain-to-text: decoding spoken phrases from phone representations in the brain
It is shown for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic recordings, and this approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones.
A Wireless Brain-Machine Interface for Real-Time Speech Synthesis
The results support the feasibility of neural prostheses that may have the potential to provide near-conversational synthetic speech output for individuals with severely impaired speech motor control.
Towards reconstructing intelligible speech from the human auditory cortex
The results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram.
Phonetic Feature Encoding in Human Superior Temporal Gyrus
High-density direct cortical surface recordings in humans while they listened to natural, continuous speech were used to reveal the STG representation of the entire English phonetic inventory, demonstrating the acoustic-phonetic representation of speech in human STG.
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
It is found that real-time synthesis of vowels and consonants was possible with good intelligibility and open to future speech BCI applications using such articulatory-based speech synthesizer.
Decoding spectrotemporal features of overt and covert speech from the human cortex
Electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech data set, supporting a partially shared neural substrate.
Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity.
The initial efforts toward the design of a neural speech recognition system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography are described.
Functional Organization of Human Sensorimotor Cortex for Speech Articulation
High-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies the ability to speak.