Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

  title={Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition},
  author={Mengzhe Geng and Xurong Xie and Zi Ye and Tianzi Wang and Guinan Li and Shujie Hu and Xunying Liu and Helen M. Meng},
Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of heterogeneity commonly found in normal speech including accent or gender, when further compounded with the variability over age and speech pathology severity level, create large diversity among speakers. To this end, speaker adaptation techniques play a key role in… 
Speaker adaptation for Wav2vec2 based dysarthric ASR
This work proposes a simple adaptation network for wav2vec2 using fMLLR features that is also able to handle other speaker adaptive features such as xvectors and shows steady improvements across all impair-ment severity levels.
On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition
Experiments conducted on the UASpeech dysarthric and DimentiaBank Pitt elderly speech datasets suggest the proposed SBEVR features based adaptation statistically outperform both the baseline on-the-fiy i-Vector adapted hybrid TDNN/DNN systems and offline batch mode model based LHUC adaptation using all speaker-level data.
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
This paper presents a cross-domain and crosslingual A2A inversion approach that utilizes the parallel audio, visual and ultrasound tongue imaging (UTI) data of the 24hour TaL corpus in A1A model pre-training before being cross domain and cross-lingual adapted to three datasets across two languages to produce UTI based articulatory features.


The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis.
It is indicated that longitudinal follow-up of demented cases increases accuracy of diagnosis, and that detailed cognitive testing aids in early classification, in a natural history study of Alzheimer's disease.
Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a
Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
This paper proposes a simple yet effective model-based neural network speaker adaptation technique that learns speaker-specific hidden unit contributions given adaptation data, without requiring any
The Kaldi Speech Recognition Toolkit
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
The TORGO database of acoustic and articulatory speech from speakers with dysarthria
This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects.
Subspace-based signal analysis using singular value decomposition
A unified approach is presented to the related problems of recovering signal parameters from noisy observations and identifying linear system model parameters from observed input/output signals, both
Latent Dirichlet Allocation
Dysarthric speech database for universal access research
A database of dysarthric speech produced by 19 speakers with cerebral palsy provides a fundamental resource for automatic speech recognition development for people with neuromotor disability.
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus
  • Zi Ye, Shoukang Hu, H. Meng
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
A state-of-the-art automatic speech recognition (ASR) system built on the Dementia-Bank Pitt corpus for automatic NCD detection, with the best NCD Detection accuracy of 88%, comparable to that using the ground truth speech transcripts.
Investigation of Data Augmentation Techniques for Disordered Speech Recognition
A set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbators and speed perturbations, are investigated, finding variations among impaired speakers in both the original and augmented data were exploited.