• Publications
  • Influence
Merlin: An Open Source Neural Network Speech Synthesis System
TLDR
We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis. Expand
  • 242
  • 34
  • PDF
Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis
TLDR
We show that the hidden representation used within a DNN can be improved through the use of Multi-Task Learning, and that stacking multiple frames of hidden layer activations (stacked bottleneck features) also leads to improvements. Expand
  • 229
  • 14
  • PDF
Sentence-level control vectors for deep neural network speech synthesis
TLDR
This paper describes the use of a low-dimensional vector representation of sentence acoustics to control the output of a feed-forward deep neural network text-to-speech system on a sentence-by-sentence basis. Expand
  • 47
  • 5
  • PDF
Unsupervised learning for text-to-speech synthesis
TLDR
This thesis introduces a general method for incorporating the distributional analysis of textual objects into text-to-speech (TTS) conversion systems. Expand
  • 50
  • 5
  • PDF
Peat in horticulture and conservation: the UK response to a changing world
Peat bogs are increasingly recognised as valuable habitats for wildlife and important stores of carbon. Yet the UK horticultural industry relies heavily on peat sourced from bogs in the UK andExpand
  • 79
  • 5
  • PDF
Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis
TLDR
This paper presents techniques for building text-to-speech frontends in a way that avoids the need for language-specific expert knowledge, but instead relies on universal resources (such as the Unicode character database) and unsupervised learning from unannotated data to ease system development. Expand
  • 27
  • 4
  • PDF
Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora
TLDR
In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Expand
  • 91
  • 3
  • PDF
Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
TLDR
We propose to combine our previous work on vector space representations of linguistic context, which have the added advantage of working directly from textual input, and Deep Neural Networks (DNNs), which can directly accept such continuous representations as input. Expand
  • 58
  • 3
  • PDF
The CSTR/EMIME HTS system for Blizzard Challenge 2010
The European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant agreement 213845 (the EMIME project)
  • 48
  • 3
  • PDF
HMM-based synthesis of child speech
TLDR
The synthesis of child speech presents challenges both in the collection of data and in the building of a synthesiser from that data. Expand
  • 16
  • 3
  • PDF