Chi-Chun Hsia

Learn More
—This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for(More)
—This paper presents an approach to hierarchical prosody conversion for emotional speech synthesis. The pitch contour of the source speech is decomposed into a hierarchical prosodic structure consisting of sentence, prosodic word, and subsyllable levels. The pitch contour in the higher level is encoded by the discrete Legendre polynomial coefficients. The(More)
—This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and(More)
—Sleeping posture reveals important information for eldercare and patient care, especially for bed ridden patients. Traditionally, some works address the problem from either pressure sensor or video image. This paper presents a multimodal approach to sleeping posture classification. Features from pressure sensor map and video image have been proposed in(More)
—In emotional speech synthesis, a large speech database is required for high-quality speech output. Voice conversion needs only a compact-sized speech database for each emotion. This study designs and accumulates a set of phonetically balanced small-sized emotional parallel speech databases to construct conversion functions. The Gaussian mixture bigram(More)
In this study, a conversion function clustering and selection approach to conversion-based expressive speech synthesis is proposed. First, a set of small-sized emotional parallel speech databases is designed and collected to train the conversion functions. Gaussian mixture bi-gram model (GMBM) is adopted as the conversion function to model the temporal and(More)
This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM(More)