Corpus ID: 53870734

Post-Processing Using Speech Enhancement Techniques for Unit Selection andHidden Markov Model-based Low Resource Language

  title={Post-Processing Using Speech Enhancement Techniques for Unit Selection andHidden Markov Model-based Low Resource Language},
  author={Sangramsing Kayte and Monica R. Mundada},
A speech signal captured by a distant microphone is generally contaminated by background noise, which severely degrades the audible quality and intelligibility of the observed signal. To resolve this issue, speech enhancement has been intensively studied. In this paper, we consider a text-informed speech enhancement, where the enhancement process is guided by the corresponding text information, i.e. a correct transcription of the target utterance. The proposed Unit Selection Synthesis (USS) and… Expand

Figures from this paper


Hidden-Markov-Model Based Speech Enhancement
Results show an increase of speech quality and intelligibility in comparison to speech synthesized solely from text, up to the point of being nearly indistinguishable from the original. Expand
Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement
The experimental results confirm the superiority of the proposed VAD compared to the reference methods particularly for speech detection rate at the dominant noisy conditions. Expand
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed. Expand
Unit selection in a concatenative speech synthesis system using a large speech database
  • Andrew J. Hunt, A. Black
  • Computer Science
  • 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
It is proposed that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. Expand
Speech synthesis using HMMs with dynamic features
A new text-to-speech synthesis system based on HMM which includes dynamic features, i.e., delta and delta-delta parameters of speech, which becomes quite smooth and natural even if the number of clustered states is small. Expand
An HMM-based speech synthesis system applied to English
This paper describes an HMM-based speech synthesis system (HTS), in which the speech waveform is generated from HMM themselves, and applies it to English speech synthesis using the general speechExpand
Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions
Vowel-like regions (VLRs) in speech includes vowels, semi-vowels, and diphthong sound units are detected using the knowledge of VLROPs during training and testing and significant improvement in the performance is reported for speaker verification under degraded conditions. Expand
Prosody and the Selection of Source Units for Concatenative Synthesis
This chapter describes a procedure for processing a large speech corpus to provide a reduced set of units for concatenative synthesis, and presents a method for selecting units for synthesis by optimizing a weighting between continuity distortion and unit distortion. Expand
A useful feature-engineering approach for a LVCSR system based on CD-DNN-HMM algorithm
The results show that the proposed feature-engineering approach outperforms the traditional Mel Frequency Cepstral Coefficient (MFCCs) GMM + Mel-frequency filter-bank output DNN method. Expand
Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described. Expand