Corpus ID: 235458096

Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

  title={Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model},
  author={Mohammad Jalilpour-Monesi and Bernd Accou and T. Francart and H. V. hamme},
Decoding the speech signal that a person is listening to from the human brain via electroencephalography (EEG) can help us understand how our auditory system works. Linear models have been used to reconstruct the EEG from speech or vice versa. Recently, Artificial Neural Networks (ANNs) such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures have outperformed linear models in modeling the relation between EEG and speech. Before attempting to use these… Expand

Figures from this paper


An LSTM Based Architecture to Relate Speech Stimulus to Eeg
A novel Long Short-Term Memory (LSTM)-based architecture as a nonlinear model for the classification problem of whether a given pair of EEG and speech envelope correspond to each other or not and uses transfer learning to fine-tune the model for each subject. Expand
Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech
Existing linear models that learn the mapping from neural activity to an attended speech envelope are replaced by a non‐linear neural network (NN) that profits from a wider EEG frequency range and achieves a performance seven times higher than the linear baseline. Expand
Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario
This paper investigates the simultaneous design of the EEG decoder and the audio subband envelope recombination weights vector using either a norm-constrained least squares or a canonical correlation analysis, but concludes that this increases computational complexity without improving AAD performance. Expand
Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods
This work compares the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step, and indicates that the wet and dry systems can deliver comparable results despite the latter having one third as many EEG channels as the former. Expand
Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing
Electroencephalography is used to provide evidence for categorical phoneme-level speech processing by showing that the relationship between continuous speech and neural activity is best described when that speech is represented using both low-level spectrotemporal information and categorical labeling of phonetic features. Expand
Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods
A novel neural-network architecture is compared against linear and non-linear baselines using both wet and dry electroencephalogram (EEG) systems, showing that the new architecture outperforms the baseline linear stimulus-reconstruction method. Expand
Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations
A model including low- and higher-level speech features allows to predict the speech reception threshold from the EEG of people listening to natural speech, which has potential applications in diagnostics of the auditory system. Expand
Data-driven spatial filtering for improved measurement of cortical tracking of multiple representations of speech
It is shown that the inclusion of acoustical and phonetic speech information and the addition of a data-driven spatial filter allow improved modelling of the relationship between the speech and its brain response and offer an automatic channel selection. Expand
Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech
Electroencephalography responses to continuous speech are characterized by obtaining the time-locked responses to phoneme instances (phoneme-related potential), and it is found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Expand
Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope
By decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, this work demonstrates an electrophysiological measure of neural processing of running speech that will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users. Expand