The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0

Chen-Yu Chiang; Wu-Hao Li; Yen-Ting Lin; Jia-Jyu Su; Wei-Cheng Chen; Cheng-Che Kao; Shu-Lei Lin; Pin-Han Lin; Shao-Wei Hong; Guan-Ting Liou; Wen-Yang Chang; Jen-Chieh Chiang; Yen-Ting Lin; Yih-Ru Wang; Sin-Horng Chen

DOI:10.1109/O-COCOSDA202257103.2022.9997860
Corpus ID: 255188027

The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0

@article{Chiang2022TheSL,
  title={The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0},
  author={Chen-Yu Chiang and Wu-Hao Li and Yen-Ting Lin and Jia-Jyu Su and Wei-Cheng Chen and Cheng-Che Kao and Shu-Lei Lin and Pin-Han Lin and Shao-Wei Hong and Guan-Ting Liou and Wen-Yang Chang and Jen-Chieh Chiang and Yen-Ting Lin and Yih-Ru Wang and Sin-Horng Chen},
  journal={2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)},
  year={2022},
  pages={1-5},
  url={https://api.semanticscholar.org/CorpusID:255188027}
}

Chen-Yu ChiangWu-Hao Li Sin-Horng Chen
Published in Oriental COCOSDA… 1 November 2022
Computer Science, Linguistics

The Speech Labeling and Modeling Toolkit version 1.0, which facilitates automatic labeling of text and speech for constructing text-to-speech (TTS) systems and speech analysis, has been applied to constructing personalized TTS systems for augmentative and alternative communication.

View on IEEE

doi.org

3 Citations

Highly Influential Citations

Background Citations

Methods Citations

Figures from this paper

Topics

Speech Synthesis Acoustic Model Linguistic Label Text-to-Speech Acoustic Features Prosody Tags Prosody Labeling

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Jiaqi SuPang-Chen Liao Chen-Yu Chiang

Medicine, Computer Science

Oriental COCOSDA International Conference on…

2023

The corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project are reported.

[PDF]

A Preliminary Study on Analysing Mandarin Tone Values of Romance L2 Mandarin Learners

Wu-Hao LiTe-Hsin LiuChen-Yu Chiang

Linguistics

Asia-Pacific Signal and Information Processing…

2024

The 5-level tone value labeling system helps characterize pitch contours of syllables for L2 Mandarin learners and teachers to facilitate tone acquisition and tone error analysis, respectively. In…

Highly Influenced

Tone Value Representation for Computer-Assisted Pronunciation Training

Wu-Hao LiTe-Hsin LiuChen-Yu Chiang

Computer Science, Linguistics

Proceedings of the International Conference on…

2024

Experimental results show that subjects can identify the class of tone by looking at the representation proposed in this study and evaluating the quality of the tones of the syllables pronounced by the speakers visually, and the approach offers more comprehensive feedback to learners.

Hierarchical prosody modeling of English speech and its application to TTS

Chung-Yao TsaiChin-Kuan KuoYih-Ru WangSin-Horng ChenI-Bin LiaoChen-Yu Chiang

Computer Science, Linguistics

2014 17th Oriental Chapter of the International…

2014

A hierarchical prosody modeling approach for English speech is proposed, an extended version of the HPM approach proposed previously for Mandarin speech that designs a syllable-based, statistical prosodic model and employs a prosody labeling and modeling algorithm to estimate the model parameters and label the prosodic tags of all training utterances simultaneously from a prosodic-unlabeled speech corpus.

Hierarchical prosody modeling for Mandarin spontaneous speech.

C. LinChung-Long YouChen-Yu ChiangYih-Ru WangSin-Horng Chen

Computer Science, Linguistics

Journal of the Acoustical Society of America

2019

An application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed, with significant relative error rate reductions for base-syllable, character, tone, and word recognition, respectively.

A Prosodic Mandarin Text-to-Speech System Based on Tacotron

Chuxiong ZhangS. ZhangHaibin Zhong

Computer Science

Asia-Pacific Signal and Information Processing…

2019

Under subjective evaluation in terms of the prosody, results show that the synthesis system performs better by adding the prosodic system as the front-end system for Tacotron.

Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis

Yanfeng LuM. DongYing Chen

Computer Science, Linguistics

IEEE International Conference on Acoustics…

2019

This paper will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder, and add extra contextual information to improve the performance of prosodic phrasing.

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Yi RenChenxu Hu Tie-Yan Liu

Computer Science

International Conference on Learning…

2021

FastSpeech 2 is proposed, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by directly training the model with ground-truth target instead of the simplified output from teacher, and introducing more variation information of speech as conditional inputs.

1,442

[PDF]

Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi

Michael McAuliffeMichaela SocolofSarah MihucM. WagnerMorgan Sonderegger

Computer Science, Linguistics

Interspeech

2017

The Montreal Forced Aligner is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features.

A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2

Junmin LiuZhu XieChunxia ZhangGuang Shi

Computer Science

International Journal of Machine Learning and…

2021

This paper proposes in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2 to help Tacotrons synthesize more natural and human-like Mandarin speech.

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech

Guan-Ting LiouChen-Yu ChiangYih-Ru WangSin-Horng Chen

Computer Science, Linguistics

Interspeech

2018

The generated prosody with local speaking rate variations is proved to be more vivid than the one with a constant speaking rate and use in the prosody generation of Mandarin TTS.

Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS

I-Bin LiaoChen-Yu ChiangYih-Ru WangSin-Horng Chen

Computer Science

IEEE/ACM Transactions on Audio Speech and…

2016

Both objective and subjective evaluations show that the proposed method not only performs better than the maximum likelihood-based method in the observed SR range of the target speaker's data, but also is much better in the unseen SR ranges.

Latent Prosody Model of Continuous Mandarin Speech

Chen-Yu ChiangXiaodong WangY. LiaoYih-Ru WangSin-Horng ChenK. Hirose

Computer Science, Linguistics

IEEE International Conference on Acoustics…

2007

A latent prosody model (LPM) aiming to jointly model the affections of tone and prosody state on FO is proposed, with the main purposes of improving tone recognition accuracy and automatic prosodyState labeling.

The Speech Labeling and Modeling Toolkit (SLMTK) Version 1.0

Figures from this paper

Topics

3 Citations

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

A Preliminary Study on Analysing Mandarin Tone Values of Romance L2 Mandarin Learners

Tone Value Representation for Computer-Assisted Pronunciation Training

39 References

Hierarchical prosody modeling of English speech and its application to TTS

Hierarchical prosody modeling for Mandarin spontaneous speech.

A Prosodic Mandarin Text-to-Speech System Based on Tacotron

Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi

A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech

Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS

Latent Prosody Model of Continuous Mandarin Speech

Related Papers