Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery

  title={Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery},
  author={Yasuaki Okuda and Ryo Ozaki and Tadahiro Taniguchi},
—Word and phoneme discovery are important tasks in language development for human infants. Infants acquire words and phonemes from unsegmented speech signals using segmen- tation cues, such as distributional, prosodic, and co-occurrence cues. Many pre-existing computational models that represent the process tend to focus on distributional or prosodic cues. This paper proposes a nonparametric Bayesian probabilistic generative model called the prosodic hierarchical Dirichlet process-hidden… 



Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition From Continuous Speech Signals

A probabilistic generative model that integrates LM and AM, i.e., HDP-HLM is developed, an inference procedure is derived using the blocked Gibbs sampler, and the NPB-DAA can discover words directly from continuous human speech signals in an unsupervised manner.

Comparative study of feature extraction methods for direct word discovery with NPB-DAA from natural speech signals

  • Yuki TadaY. HagiwaraT. Taniguchi
  • Computer Science
    2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
  • 2017
The results showed that 1) NPB-DAA with/without DSAE can extract words and phonemes from natural speech signals containing consonants to a certain extent, 2) naive introduction of dynamics features can even harm the performance of word discovery, and 3) D SAE can consistently increase the correlation between the log-likelihood and the performance measure ofword discovery.

Accelerated Nonparametric Bayesian Double Articulation Analyzer for Unsupervised Word Discovery

  • Ryo OzakiT. Taniguchi
  • Computer Science
    2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
  • 2018
An accelerated nonparametric Bayesian double articulation analyzer for enabling a developmental robot to acquire words and phonemes directly from speech signals without labeled data in more realistic scenario than conventional NPB-DAA is described.

A self-referential childlike model to acquire phones, syllables and words from acoustic speech

This work proposes a model for early infant word learning embedded into a layered architecture comprising phone, phonotactics and syllable learning, which aims to learn the structure of speech unsupervised on different levels of granularity.

Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals

The combined method, the NPB-DAA with the DSAE, outperforms pre-existing unsupervised learning methods, and shows state-of-the-art performance.

Unsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context

A phoneme-length context model for PYSMM is proposed to give a helpful cue at the phoneme -level and to predict succeeding segments more accurately and showed that the peak performance with the context model outperformed those without such a context model by 0.045 at most in terms of F-measures of estimated segmentation.

Joint Learning of Phonetic Units and Word Pronunciations for ASR

An unsupervised alternative ‐ requiring no language-specific knowledge ‐ to the conventional manual approach for creating pronunciation dictionaries is proposed, which jointly discovers the phonetic inventory and the Letter-to-Sound mapping rules in a language using only transcribed data.

SIGNAL TO SYNTAX : Bootstrapping From Speech to Grammar in Early Acquisition

The hypothesis that young infants rely on prosodic cues in speech 10 bootstrap their way into syntax has received considerable attention in recent discussions of early language development (e.g.,


One of the infant’s first tasks in language acquisition is to discover the words embedded in a mostly continuous speech stream. This learning problem might be solved by using distributional cues to

Word Segmentation From Phoneme Sequences Based On Pitman-Yor Semi-Markov Model Exploiting Subword Information

A model based on subword N-gram and subword estimation using a vocabulary set, and posterior fusion of the results of a PYSMM and the authors' model to take advantage of both are proposed.