Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

  title={Analysis and Tuning of a Voice Assistant System for Dysfluent Speech},
  author={Vikramjit Mitra and Zifang Huang and Colin S. Lea and Lauren Tooley and Sarah Wu and Darren Botten and Ashwin Palekar and Shrinath Thelapurath and Panayiotis G. Georgiou and Sachin S. Kajarekar and Jefferey Bigham},
Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice op-erated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a… 

Figures and Tables from this paper

Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique

A method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network and outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal.

Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese

. In recent years, voice-AI systems have seen significant improvements in intelligibility and naturalness, but the human experience when talking to a machine is still remarkably different from the

Robust Stuttering Detection via Multi-task and Adversarial Learning

This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI), and the methods show promising results and outperform the baseline in various disfluency classes.

Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

A systematic review of the literature on statistical and machine learning schemes for identifying symptoms of developmental stuttering from audio recordings is reported and recommendations were made about how these problems can be addressed in future work on this topic.



Automatic recognition of children's read speech for stuttering application

This study investigates how automatic speech recognition could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said and examines the effect of augmenting the language model with artificially generated data.

Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition

This work investigates data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words and shows that after including a small amount of data with dis fluencies in the training set the recognition accuracy on the tests with disfluency and stuttering improves.

A Lightly Supervised Approach to Detect Stuttering in Children's Speech

This work uses a lightly-supervised approach using task-oriented lattices to recognise the stuttering speech of children performing a standard reading task, and proposes a training regime to address this problem, and preserve a full verbatim output of stuttered speech.

SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter

This work introduces Stuttering Events in Podcasts (SEP-28k), a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetition, and interjections, and benchmarks a set of acoustic models on SEP- 28k and the public FluencyBank dataset.

Identification of Primary and Collateral Tracks in Stuttered Speech

This work introduces a novel forced-aligned disfluency dataset from a corpus of semi-directed interviews, and presents baseline results directly comparing the performance of text-based features (word and span information) and speech-based (acoustic-prosodic information).

Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory

This work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition, outperforming the state-of-the-art by almost 27%.

FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

An end-to-end deep neural network, FluentNet, capable of detecting a number of different disfluency types and achieving state-of-the-art results by outperforming other solutions in the field on the publicly available UCLASS dataset is proposed.

Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech

A tool is built that rescores a word lattice taking into account the clinician’s annotations, and the improvement over a baseline version is described.

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

A novel explainable AI (XAI) assisted convolutional neural network (CNN) classifier is proposed to predict near future stuttering by learning temporal facial muscle movement patterns of AWS and explains the important facial muscles and actions involved.

The UCLASS archive of stuttered speech

Audio recordings of stuttered speech can be used for research and clinical purposes, but such recordings have not always been easy to obtain in sufficient numbers to fulfill these needs. Speech