Automatic measurement of vowel duration via structured prediction

  title={Automatic measurement of vowel duration via structured prediction},
  author={Yossi Adi and Joseph Keshet and Emily Cibelli and Erin N. Gustafson and Cynthia G. Clopper and Matthew Goldrick},
  journal={The Journal of the Acoustical Society of America},
  volume={140 6},
A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel… Expand
Automatic Measurement of Pre-Aspiration
Two machine learning methods for automatic measurement of pre-aspiration duration are proposed: a feedforward neural network, which works at the frame level; and a structured prediction model, which relies on manually designed feature functions, and work at the segment level. Expand
Machine Assisted Analysis of Vowel Length Contrasts in Wolof
This paper proposes multiple features to make a fine evaluation of the degree of length contrast under different factors such as: read vs semi spontaneous speech ; standard vs dialectal Wolof; and notably shows that contrast is weaker in semi-spontaneous speech and in a non standard semi-Spontaneous dialect. Expand
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
The proposed model is a convolutional neural network that operates directly on the raw waveform that is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle and reaches state-of-the-art performance on both data sets. Expand
Phoneme Boundary Detection Using Learnable Segmental Features
This work proposes a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection and evaluates the model on a He-brew corpus to demonstrate such phonetic supervision can be beneficial in a multi-lingual setting. Expand
Assessing automatic VOT annotation using unimpaired and impaired speech
This work evaluates how well one automatic annotation tool, AutoVOT, can approximate manual annotation by comparing analyses of automatically and manually annotated speech in two studies and suggests that automatic methods may be a viable way to reduce phonetic annotation costs in the right circumstances. Expand
The Influence of Lexical Selection Disruptions on Articulation
Interactive models of language production predict that it should be possible to observe long-distance interactions; effects that arise at one level of processing influence multiple subsequent stagesExpand
Adversarial Examples on Discrete Sequences for Beating Whole-Binary Malware Detection
This work introduces a novel approach to generating adversarial example for attacking a whole-binary malware detector, and append to the binary file a small section that steers the prediction of the network from malicious to be benign with high confidence. Expand
Deceiving End-to-End Deep Learning Malware Detectors using Adversarial Examples
This work introduces a novel loss function for generating adversarial examples specifically tailored for discrete input sets, such as executable bytes, and modify malicious binaries so that they would be detected as benign, while preserving their original functionality, by injecting a small sequence of bytes in the binary file. Expand
Collecter, Transcrire, Analyser : quand la machine assiste le linguiste dans son travail de terrain. (Collecting, Transcribing, Analyzing : Machine-Assisted Linguistic Fieldwork)
Nous avons developpe LIG-AIKUMA, une application mobile de collecte de parole innovante, qui permet d'effectuer des enregistrements directement exploitables par les moteurs de reconnaissance automatique de the parole (RAP), which presentent un interet fort pour les technologies of the parole, notamment pour l'apprentissage non supervise. Expand
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
This work presents an approach for watermarking Deep Neural Networks in a black-box way, and shows experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for. Expand


Vowel duration measurement using deep neural networks
This work tries two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compares their accuracy to an HMM-based forced aligner, to build an algorithm for automatic accurate measurement of vowel duration. Expand
Automatic measurement of voice onset time using discriminative structured prediction.
A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech, which is near human intertranscriber reliability, and compares favorably with previous work. Expand
Toward completely automated vowel extraction: Introducing DARLA
A fully automated program called DARLA is introduced, which automatically generates transcriptions with ASR and extracts vowels using FAVE and is tested on a dataset of the US Southern Shift and compares the results with semi-automated methods. Expand
Erratum to: Grammatical constraints on phonological encoding in speech production
An error in the statistical analyses of the acoustic data that undermine claims regarding phonetic processing is reported, which suggests that the phonetic variation in these data is related to variation in phonological planning times. Expand
Automatic phonetic segmentation using boundary models
Results show that the combination of special one-state phone boundary models and monophone HMMs can significantly improve forced alignment accuracy and HMM-based forced alignment systems can benefit from using precise phonetic segmentation for training HMMs. Expand
TIMIT Acoustic-Phonetic Continuous Speech Corpus
Speech recognition based on phones is very attractive since it is inherently free from vocabulary limitations, but large Vocabulary ASR systems’ performance depends on the quality of the phone recognizer, so research teams continue developing phone recognizers, in order to enhance their performance as much as possible. Expand
A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment
A discriminative algorithm for learning to align an audio signal with a given sequence of events that tag the signal and experimental results are comparable to results of state-of-the-art systems. Expand
EasyAlign: An Automatic Phonetic Alignment Tool Under Praat
Evaluation showed that the performances of this HTK-based aligner compare to human alignment and to other existing alignment tools. Expand
Prosodylab-aligner: A tool for forced alignment of laboratory speech
The Penn Forced Aligner automates the alignment process using the Hidden Markov Model Toolkit (HTK). The core of Prosodylab-Aligner is align. py, a script which performs acoustic model training andExpand
Acoustic characteristics of American English vowels.
Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels. Expand