Idiolect Extraction and Generation for Personalized Speaking Style Modeling

  title={Idiolect Extraction and Generation for Personalized Speaking Style Modeling},
  author={Chung-Hsien Wu and Chung-Han Lee and Chung-Hau Liang},
  journal={IEEE Transactions on Audio, Speech, and Language Processing},
A person's speaking style, consisting of such attributes as voice, choice of vocabulary, and the physical motions employed, not only expresses the speaker's identity but also emphasizes the content of an utterance. Speech combining these aspects of speaking style becomes more vivid and expressive to listeners. Recent research on speaking style modeling has paid more attention to speech signal processing. This approach focuses on text processing for idiolect extraction and generation to model a… 
Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech
The results of subjective listening test show that the proposed method can improve the spontaneity and speaker similarity of the synthesized speech compared to the maximum likelihood linear regression based speaker adaptation method.
Sentence Correction Incorporating Relative Position and Parse Template Language Models
Experimental results show that compared to a state-of-the-art phrase-based statistical machine translation system, the error correction performance of the proposed approach achieves a significant improvement using human evaluation.
A Case Study on The Two Turkish Translations of Paul Auster’s City of Glass
Bu calismanin amaci, Paul Auster’in Cam Kent romaninin iki farkli cevirisindecevirmene zorluk yaratacak ogelerin cevirmenler tarafindan nasil cevrildiginiVenuti’nin yerlilestirme ve yabancilastirma


Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its
Speaker recognition based on idiolectal differences between speakers
These initial experiments strongly suggest that further exploration of “familiar” speaker characteristics will likely be an extremely interesting and valuable research direction for recognition of speakers in conversational speech.
High-level speaker verification with support vector machines
A new kernel based upon standard log likelihood ratio scoring to address limitations of text classification methods is derived and it is shown that the methods achieve significant gains over standard methods for processing high-level features.
Conversion Function Clustering and Selection Using Linguistic and Spectral Information for Emotional Voice Conversion
This study designs and accumulates a set of phonetically balanced small- sized emotional parallel speech databases to construct conversion functions and presents a framework by incorporating linguistic and spectral information for conversion function clustering and selection.
Variable-Length Unit Selection in TTS Using Structural Syntactic Cost
A variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units and outperforms the synthesizer without considering syntactic property is presented.
Natural language spoken interface control using data-driven semantic inference
The concept of data-driven semantic inference is introduced, which in principle allows for any word constructs in command/query formulation, which is no longer necessary for users to memorize the exact syntax of every command.
Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology
An approach based on unknown words is proposed for meaningful term extraction and discriminative term selection in text categorization and a phrase-like unit (PLU)-based likelihood ratio is proposed to estimate the likelihood that a word sequence is an unknown word.
Unknown Word Detection for Chinese by a Corpus-based Learning Method
A corpus-based learning method is proposed which derives sets of syntactic rules that are applied to distinguish monosyllabic words from monosyntactic morphemes which may be parts of unknown words or typographical errors.
Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and Contextual Analysis
Experimental results indicate that the created phonetic set provides a compact and robust set that considers acoustic and contextual information for mixed-language or multilingual speech recognition.