Review of TDNN (time delay neural network) architectures for speech recognition

@article{Sugiyama1991ReviewOT,
  title={Review of TDNN (time delay neural network) architectures for speech recognition},
  author={Masahide Sugiyama and Hidefumi Sawai and Alexander H. Waibel},
  journal={1991., IEEE International Sympoisum on Circuits and Systems},
  year={1991},
  pages={582-585 vol.1}
}
The TDNN architecture for speech recognition is described, and its recognition performance for Japanese phonemes and phrases is explained. In comparative studies, it is shown that the TDNN yields superior phoneme recognition performance. The TDNN optimized for phoneme recognition, however, does not necessarily result in optimized word or phrase recognition performance, as overfitting to the specific phoneme data or recording conditions may occur. Care must therefore be taken to achieve robust… 

Figures and Tables from this paper

Speaker-Independent Vowel Recognition for Malay Children Using Time-Delay Neural Network
TLDR
It was found out that the 30ms frame rate produced the highest vowel recognition accuracy with 81.92%.
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
TLDR
This thesis presents a modular deep neural network for acoustic unit classification that can combine multiple well trained feature extraction networks into its topology and presents a word prediction deep network that functions at the lower subword level.
Speech coding and speech recognition technologies: a review
  • A. Spanias, F. H. Wu
  • Computer Science
    1991., IEEE International Sympoisum on Circuits and Systems
  • 1991
TLDR
The present treatment of speech recognition techniques concentrates on the methodologies for voice recognition and the progress made in speaker-independent recognition.
Time-frequency shift-tolerance and counterpropagation network with applications to phoneme recognition
TLDR
A recognition scheme is developed for temporal-spectral alignment of nonstationary signals by perfonning preprocessing on the time-frequency distributions of the speech phonemes and a modification to the counterpropagation network is proposed that is suitable for phoneme recognition.
Gated Module Neural Network for Multilingual Speech Recognition
  • Y. Liao, Matus Pleva, J. Juhár
  • Computer Science
    2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
  • 2018
TLDR
A gated module neural network approach that adapts a language identification component to directly assist the final multilingual LVCSR goal to increase the inter-language discrimination capacity is presented.
Spoken Arabic digits recognizer using recurrent neural networks
  • Y. Alotaibi
  • Computer Science
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004.
  • 2004
TLDR
Arabic digits were investigated from the speech recognition problem point of view and a recurrent neural networks based speech recognition system was designed and tested with automatic Arabic digits recognition, achieving 99.5% correct digit recognition in the case of multispeaker mode, and 94.
Classification of Malay speech sounds based on place of articulation and voicing using neural networks
  • T. Nong, J. Yunus, S. Salleh
  • Computer Science
    Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239)
  • 2001
TLDR
This paper investigates the effectiveness of using neural networks in classifying Malay speech sounds according to their place of articulation and voicing, and proposes a system that classifies 16 selected Malay syllables into their groups of phonetic features.
Cross-lingual, Language-independent Phoneme Alignment
TLDR
The goal of this thesis is to apply cross-lingual, multilingual techniques on thetask of phoneme alignment, i.e. the task of temporally aligning a phonetic transcript to its corresponding audio recording.
Liveness Verification Using Deep Neural Network Based Visual Speech Recognition
We present a novel approach to liveness verification based on visual speech recognition within a challenge-based framework which has the potential to be used on mobile devices to prevent replay or
The study of L2 mispronunciation detection based on Mandarin landmarks
It has been proved that phonetics knowledge or data-driven method based acoustic landmarks are useful in detecting mispronunciation. The acoustic landmarks obtained by the two methods are not
...
1
2
3
...

References

SHOWING 1-10 OF 38 REFERENCES
TDNN-LR continuous speech recognition system using adaptive incremental TDNN training
  • H. Sawai
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
TLDR
Efficiency in the adaptive incremental training using a small number of training tokens extracted from continuous speech was confirmed in the TDNN-LR system and provides large-vocabulary and continuous speech recognition.
Time-state neural networks (TSNN) for phoneme identification by considering temporal structure of phonemic features
  • Y. Komori
  • Linguistics
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
TLDR
A new structure for phoneme identification neural networks, time-state neural networks (TSNNs), able to deal with the temporal structure of phonemic features are reported, tested on Japanese phonemes taken from isolated word, phrase, and sentence utterances.
A pairwise discriminant approach to robust phoneme recognition by time-delay neural networks
  • Jun-ichi Takami, S. Sagayama
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
The authors propose a phoneme recognition method using pairwise discriminant time-delay neural networks (PD-TDNNs) and show its high recognition performance with experimental results for /b, d, g, m,
Phoneme recognition using time-delay neural networks
The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing
Frequency-time-shift-invariant time-delay neural networks for robust continuous speech recognition
  • H. Sawai
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
The authors propose neural network (NN) architectures for robust speaker-independent, continuous speech recognition. One architecture is the frequency-time-shift-invariant time-delay neural network
Integrating time alignment and neural networks for high performance continuous speech recognition
  • P. Haffner, M. Franzini, A. Waibel
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One
Continuous speech recognition using multilayer perceptrons with hidden Markov models
  • N. Morgan, H. Bourlard
  • Computer Science
    International Conference on Acoustics, Speech, and Signal Processing
  • 1990
TLDR
A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) into a hidden Markov model (HMM) approach is described, which appears to be somewhat better when MLP methods are used to estimate the probabilities.
The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition
  • J. Hampshire, A. Waibel
  • Computer Science
    International Conference on Acoustics, Speech, and Signal Processing
  • 1990
TLDR
The Mega-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier and the Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and varying speaker dialects.
Connectionist Viterbi training: a new hybrid method for continuous speech recognition
A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented. CVT can be run
Robust connectionist parsing of spoken language
  • Ajay N. Jain, A. Waibel
  • Computer Science, Linguistics
    International Conference on Acoustics, Speech, and Signal Processing
  • 1990
TLDR
A modular, recurrent connectionist network architecture which learns to robustly perform incremental parsing of complex sentences is presented and generalize and display tolerance to input which has been corrupted in ways common in spoken language.
...
1
2
3
4
...