An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification.


PURPOSE The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. METHOD The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. RESULTS When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). CONCLUSION We identified a 4-sensor set--that is, T1, T4, UL, LL--that yielded a classification accuracy (91%-95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

DOI: 10.1044/2015_JSLHR-S-14-0112

Cite this paper

@article{Wang2016AnOS, title={An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification.}, author={Jun Wang and Ashok Samal and Panying Rong and Jordan R. Green}, journal={Journal of speech, language, and hearing research : JSLHR}, year={2016}, volume={59 1}, pages={15-26} }