Neural network based generation of fundamental frequency contours

  title={Neural network based generation of fundamental frequency contours},
  author={Michael S. Scordilis and John N. Gowdy},
  journal={International Conference on Acoustics, Speech, and Signal Processing,},
  pages={219-222 vol.1}
  • M. ScordilisJ. Gowdy
  • Published 23 May 1989
  • Computer Science
  • International Conference on Acoustics, Speech, and Signal Processing,
Although a number of algorithms exist for the generation of the fundamental frequency contour in automatic text-to-speech conversion systems, the absence of a general theory of intonation still prevents the correct derivation of this important feature in unrestricted text applications. A parallel distributed approach is presented in which two neural networks were designed to learn the F0 values for each phoneme and the F0 fluctuations within each phoneme for words that correspond to a small… 

Figures from this paper

Fundamental Frequency Modeling for Neural-Network-Based Statistical Parametric Speech Synthesis

This thesis treats F0 modeling as a sequential conversion problem where the input linguistic feature sequence is converted by a neural F0 model into an F0 contour frame by frame.

Neural network-based F0 text-to-speech synthesiser for Mandarin

A neural-network-based approach to synthesising FO information for Mandarin text-tospeech is discussed, using neural networks to model the relationship between linguistic features extracted from input text and parameters representing the pitch contour of syllables.

Investigation of phonemic context in speech using self-organizing feature maps

    V. KepuskaJ. Gowdy
    Computer Science
    International Conference on Acoustics, Speech, and Signal Processing,
  • 1989
The authors have shown for their database that the sequence of responding units is consistent and similar for isolated utterances of the same word and distinct for different words, and propose an algorithm for sequence smoothing.


This paper presents the initial results of an investigation to determine the amount of training data required to reach optimal generalization in neural speech synthesizers, through an empirical exploration of the number of training patterns on test set error.

Vowel synthesis using feed-forward neural networks

Interestingly, neural networks with no hidden layer proved to be as capable of learning the mapping as those with a hidden layer, and a relationship predicting the result of a modified rhyme is derived.

Prosody generation with a neural network: weighing the importance of input parameters

The approach presented here tries to quantify the contribution of each input parameter by comparing the mean errors of networks trained with only one parameter each and by looking at the performance of a group of networks where each lacks one parameter.

A dynamical system model for generating fundamental frequency for speech synthesis

A new approach to generation of two important cues to prosodic patterns-fundamental frequency (F/sub 0/) and energy contours-given symbolic prosodic labels and text with a dynamical system model.

A Language-Independent Neural Network-Based Speech Synthesizer

An artificial speech synthesizer based on neural networks is being developed for application to deeply embedded systems for language-independent speech commands on hands-free interfaces and initial experimental results show the expected properties of language independence and in-system learning.

Neural network control for a cascade/parallel formant synthesizer

    Michael S. ScordilisJ. Gowdy
    Computer Science
    International Conference on Acoustics, Speech, and Signal Processing
  • 1990
Neural network control of a cascade/parallel formant text-to-speech synthesizer model is investigated and results for the generation of the fundamental frequency contour using feedforward and sequential networks are shown.

Review of text-to-speech conversion for English.

    D. Klatt
    The Journal of the Acoustical Society of America
  • 1987
This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis.

NETtalk: a parallel network that learns to read aloud

NETtalk is an alternative approach that is based on an automated learning procedure for a parallel network of deterministic processing units that achieves good performance and generalizes to novel words.

Phonological Aspects of Speech Recognition,

    Trends in Speech Recognition,
  • 1980