The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0
@article{Chiang2022TheSL, title={The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0}, author={Chen-Yu Chiang and Wu-Hao Li and Yen-Ting Lin and Jia-Jyu Su and Wei-Cheng Chen and Cheng-Che Kao and Shu-Lei Lin and Pin-Han Lin and Shao-Wei Hong and Guan-Ting Liou and Wen-Yang Chang and Jen-Chieh Chiang and Yen-Ting Lin and Yih-Ru Wang and Sin-Horng Chen}, journal={2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)}, year={2022}, pages={1-5}, url={https://api.semanticscholar.org/CorpusID:255188027} }
The Speech Labeling and Modeling Toolkit version 1.0, which facilitates automatic labeling of text and speech for constructing text-to-speech (TTS) systems and speech analysis, has been applied to constructing personalized TTS systems for augmentative and alternative communication.
3 Citations
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
- 2023
Medicine, Computer Science
The corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project are reported.
A Preliminary Study on Analysing Mandarin Tone Values of Romance L2 Mandarin Learners
- 2024
Linguistics
The 5-level tone value labeling system helps characterize pitch contours of syllables for L2 Mandarin learners and teachers to facilitate tone acquisition and tone error analysis, respectively. In…
Tone Value Representation for Computer-Assisted Pronunciation Training
- 2024
Computer Science, Linguistics
Experimental results show that subjects can identify the class of tone by looking at the representation proposed in this study and evaluating the quality of the tones of the syllables pronounced by the speakers visually, and the approach offers more comprehensive feedback to learners.
39 References
Hierarchical prosody modeling of English speech and its application to TTS
- 2014
Computer Science, Linguistics
A hierarchical prosody modeling approach for English speech is proposed, an extended version of the HPM approach proposed previously for Mandarin speech that designs a syllable-based, statistical prosodic model and employs a prosody labeling and modeling algorithm to estimate the model parameters and label the prosodic tags of all training utterances simultaneously from a prosodic-unlabeled speech corpus.
Hierarchical prosody modeling for Mandarin spontaneous speech.
- 2019
Computer Science, Linguistics
An application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed, with significant relative error rate reductions for base-syllable, character, tone, and word recognition, respectively.
A Prosodic Mandarin Text-to-Speech System Based on Tacotron
- 2019
Computer Science
Under subjective evaluation in terms of the prosody, results show that the synthesis system performs better by adding the prosodic system as the front-end system for Tacotron.
Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis
- 2019
Computer Science, Linguistics
This paper will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder, and add extra contextual information to improve the performance of prosodic phrasing.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
- 2021
Computer Science
FastSpeech 2 is proposed, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by directly training the model with ground-truth target instead of the simplified output from teacher, and introducing more variation information of speech as conditional inputs.
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi
- 2017
Computer Science, Linguistics
The Montreal Forced Aligner is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features.
A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2
- 2021
Computer Science
This paper proposes in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2 to help Tacotrons synthesize more natural and human-like Mandarin speech.
An Exploration of Local Speaking Rate Variations in Mandarin Read Speech
- 2018
Computer Science, Linguistics
The generated prosody with local speaking rate variations is proved to be more vivid than the one with a constant speaking rate and use in the prosody generation of Mandarin TTS.
Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS
- 2016
Computer Science
Both objective and subjective evaluations show that the proposed method not only performs better than the maximum likelihood-based method in the observed SR range of the target speaker's data, but also is much better in the unseen SR ranges.
Latent Prosody Model of Continuous Mandarin Speech
- 2007
Computer Science, Linguistics
A latent prosody model (LPM) aiming to jointly model the affections of tone and prosody state on FO is proposed, with the main purposes of improving tone recognition accuracy and automatic prosodyState labeling.