Learn More
Current text-to-speech synthesis (TTS) systems are often perceived as lacking expressiveness, limiting the ability to fully convey information. This paper describes initial investigations into improving expressiveness for statistical speech synthesis systems. Rather than using hand-crafted definitions of expressive classes, an unsupervised clustering(More)
For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts , speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows(More)
An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize(More)
Rapidly adapting a speech recognition system to new speakers using a small amount of adaptation data is important to improve initial user experience. In this paper, a count-smoothing framework for incorporating prior information is extended to allow for the use of different forms of dynamic prior and improve the robustness of transform estimation on small(More)
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoustic space into a set of clusters and associating a(More)
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of(More)
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This(More)
Evaluation of TTS systems is essential to assess performance. The ITU-T P.85 standard was introduced in 1994 to assess the overall quality of speech synthesis systems. However it has not been widely accepted or used. This paper compares the ITU test to more commonly used tests for intelligibility (semantically unpredictable sentences (SUS)) and naturalness(More)
There is a high demand around the world for the learning of English as a second language. Correspondingly, there is a need to assess the proficiency level of learners both during their studies and for formal qualifications. A number of automatic methods have been proposed to help meet this demand with varying degrees of success. This paper considers the(More)
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is a highly challenging problem. However for tasks such as audiobooks it is essential if their use is to become widespread. Generating expressive speech from text can be divided into two parts: predicting expressive information from text; and synthesizing the(More)