Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

@article{Yilmaz2019ArticulatoryAB,
  title={Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech},
  author={Emre Yilmaz and Vikramjit Mitra and Ganesh Sivaraman and Horacio Franco},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.06533}
}

Figures and Tables from this paper

Raw Source and Filter Modelling for Dysarthric Speech Recognition

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech

Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition

The effectiveness of using unsupervised autoencoder-based bottleneck (AEBN) feature extractor trained on out-of-domain (OOD) LibriSpeech data is demonstrated and a 5fold cross-training setup on the widely used TORGO dysarthric database is proposed.

A Study into Pre-Training Strategies for Spoken Language Understanding on Dysarthric Speech

By introducing the intelligibility score as a metric of the impairment severity, this paper quantitatively analyzes the relation between generalization and pathology severity for dysarthric speech.

Determining the adaptation data saturation of ASR systems for dysarthric speakers

Two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique to determine the saturation point of the adaptation data of dysarthria, and the results show that the saturation points are different.

Determining the adaptation data saturation of ASR systems for dysarthric speakers

Two types of adaptation techniques were considered, which includes the individual MLLR and MAP adaptation technique, as well as the combined adaptation technique to determine the saturation point of the adaptation data of dysarthria, and the results show that the saturation points are different.

Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between typical and dysarthric speech

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Experiments suggested incorporating the generated articulatory features consistently outperformed the baseline hybrid TDNN and Conformer based end-to-end systems constructed using acoustic features only by statis-tically significant word error rate or character error rate reductions after data augmentation and speaker adaptation were applied.

Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition

  • Shujie HuShansong Liu H. Meng
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
A cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-artICulatory data of the 15-hour TORGO corpus in model training before being cross- domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features consistently outperformed the baseline hybrid DNN/TDNN, CTC and Conformer based end- to-end systems constructed using acoustic features only.

Independent and automatic evaluation of acoustic-to-articulatory inversion models

The ABX measure is used to evaluate a Bi-LSTM based model trained on 3 datasets (14 speakers), and it is shown that it gives information complementary to the standard measures, and enables us to evaluate the effects of dataset merging, as well as the speaker independence of the model.

Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction

A new evaluation for articulatory reconstruction which is independent of the articulatory data set used for training is presented: the phone discrimination ABX task, which uses the ABX measure to evaluate a bi-LSTM based model trained on three data sets, and shows that it gives information complementary to standard measures.

References

SHOWING 1-10 OF 64 REFERENCES

Articulatory Features for ASR of Pathological Speech

This work investigates the joint use of articulatory and acoustic features for automatic speech recognition (ASR) of pathological speech with a designated acoustic model, namely a fused-feature-map convolutional neural network (fCNN), which performs frequency convolution on acoustic features and time Convolution on articulatory features.

Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech

The results show that the system employing the proposed training scheme considerably improves the recognition of Dutch dysarthric speech compared to a baseline system with single-stage training only on a large amount of normal speech or a small amount of in-domain data.

Improving Acoustic Models in TORGO Dysarthric Speech Database

  • N. M. JoyS. Umesh
  • Computer Science
    IEEE Transactions on Neural Systems and Rehabilitation Engineering
  • 2018
This work trains speaker-specific acoustic models by tuning various acoustic model parameters, using speaker normalized cepstral features and building complex DNN-HMM models with dropout and sequence-discrimination strategies, and presents the best recognition accuracies for TORGO database till date.

Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Experimental evaluation on a database collected from nine dysarthric patients showed that the proposed approach provides sub-stantial improvement over both standard CNN and LSTM-RNN based speech recognizers.

A comparative study of adaptive, automatic recognition of disordered speech

This study investigates how far fundamental training and adaptation techniques developed in the LVCSR community can take, and a variety of ASR systems using maximum likelihood and MAP adaptation strategies are established with all speakers obtaining significant improvements compared to the baseline system regardless of the severity of their condition.

Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

Two techniques are developed that incorporate a model of the speaker's phonetic confusion matrix into the ASR process and attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence.

Articulatory Knowledge in the Recognition of Dysarthric Speech

  • F. Rudzicz
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2011
Although the statistics of vocal tract movement do not appear to be transferable between regular and disabled speakers, transforming the space of the former given knowledge of the latter before retraining gives high accuracy.

Adapting acoustic and lexical models to dysarthric speech

  • K. T. MengistuF. Rudzicz
  • Psychology
    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
It is shown that acoustic model adaptation yields an average relative word error rate (WER) reduction and that pronunciation lexicon adaptation (PLA) further reduces the relative WER by an average of 8.29% on a large vocabulary task of over 1500 words for six speakers with severe to moderate dysarthria.

Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech

The results show that despite the encouraging performance of ASR systems, and contrary to the claims in other studies, on average human listeners perform better in recognizing single-word dysarthric speech.
...