Review of TDNN (time delay neural network) architectures for speech recognition
@article{Sugiyama1991ReviewOT, title={Review of TDNN (time delay neural network) architectures for speech recognition}, author={Masahide Sugiyama and Hidefumi Sawai and Alexander H. Waibel}, journal={1991., IEEE International Sympoisum on Circuits and Systems}, year={1991}, pages={582-585 vol.1} }
The TDNN architecture for speech recognition is described, and its recognition performance for Japanese phonemes and phrases is explained. In comparative studies, it is shown that the TDNN yields superior phoneme recognition performance. The TDNN optimized for phoneme recognition, however, does not necessarily result in optimized word or phrase recognition performance, as overfitting to the specific phoneme data or recording conditions may occur. Care must therefore be taken to achieve robust…Â
26 Citations
Speaker-Independent Vowel Recognition for Malay Children Using Time-Delay Neural Network
- Computer Science
- 2011
It was found out that the 30ms frame rate produced the highest vowel recognition accuracy with 81.92%.
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
- Computer Science
- 2015
This thesis presents a modular deep neural network for acoustic unit classification that can combine multiple well trained feature extraction networks into its topology and presents a word prediction deep network that functions at the lower subword level.
Speech coding and speech recognition technologies: a review
- Computer Science1991., IEEE International Sympoisum on Circuits and Systems
- 1991
The present treatment of speech recognition techniques concentrates on the methodologies for voice recognition and the progress made in speaker-independent recognition.
Time-frequency shift-tolerance and counterpropagation network with applications to phoneme recognition
- Computer Science
- 1995
A recognition scheme is developed for temporal-spectral alignment of nonstationary signals by perfonning preprocessing on the time-frequency distributions of the speech phonemes and a modification to the counterpropagation network is proposed that is suitable for phoneme recognition.
Gated Module Neural Network for Multilingual Speech Recognition
- Computer Science2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
- 2018
A gated module neural network approach that adapts a language identification component to directly assist the final multilingual LVCSR goal to increase the inter-language discrimination capacity is presented.
Spoken Arabic digits recognizer using recurrent neural networks
- Computer ScienceProceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004.
- 2004
Arabic digits were investigated from the speech recognition problem point of view and a recurrent neural networks based speech recognition system was designed and tested with automatic Arabic digits recognition, achieving 99.5% correct digit recognition in the case of multispeaker mode, and 94.
Classification of Malay speech sounds based on place of articulation and voicing using neural networks
- Computer ScienceProceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239)
- 2001
This paper investigates the effectiveness of using neural networks in classifying Malay speech sounds according to their place of articulation and voicing, and proposes a system that classifies 16 selected Malay syllables into their groups of phonetic features.
Cross-lingual, Language-independent Phoneme Alignment
- Computer Science, Linguistics
- 2021
The goal of this thesis is to apply cross-lingual, multilingual techniques on thetask of phoneme alignment, i.e. the task of temporally aligning a phonetic transcript to its corresponding audio recording.
Liveness Verification Using Deep Neural Network Based Visual Speech Recognition
- Computer Science
- 2018
We present a novel approach to liveness verification based on visual speech recognition within a challenge-based framework which has the potential to be used on mobile devices to prevent replay or…
The study of L2 mispronunciation detection based on Mandarin landmarks
- PhysicsOther Conferences
- 2022
It has been proved that phonetics knowledge or data-driven method based acoustic landmarks are useful in detecting mispronunciation. The acoustic landmarks obtained by the two methods are not…
References
SHOWING 1-10 OF 38 REFERENCES
TDNN-LR continuous speech recognition system using adaptive incremental TDNN training
- Computer Science[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
- 1991
Efficiency in the adaptive incremental training using a small number of training tokens extracted from continuous speech was confirmed in the TDNN-LR system and provides large-vocabulary and continuous speech recognition.
Time-state neural networks (TSNN) for phoneme identification by considering temporal structure of phonemic features
- Linguistics[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
- 1991
A new structure for phoneme identification neural networks, time-state neural networks (TSNNs), able to deal with the temporal structure of phonemic features are reported, tested on Japanese phonemes taken from isolated word, phrase, and sentence utterances.
A pairwise discriminant approach to robust phoneme recognition by time-delay neural networks
- Computer Science[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
- 1991
The authors propose a phoneme recognition method using pairwise discriminant time-delay neural networks (PD-TDNNs) and show its high recognition performance with experimental results for /b, d, g, m,…
Phoneme recognition using time-delay neural networks
- Computer ScienceIEEE Trans. Acoust. Speech Signal Process.
- 1989
The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing…
Frequency-time-shift-invariant time-delay neural networks for robust continuous speech recognition
- Computer Science[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
- 1991
The authors propose neural network (NN) architectures for robust speaker-independent, continuous speech recognition. One architecture is the frequency-time-shift-invariant time-delay neural network…
Integrating time alignment and neural networks for high performance continuous speech recognition
- Computer Science[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
- 1991
The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One…
Continuous speech recognition using multilayer perceptrons with hidden Markov models
- Computer ScienceInternational Conference on Acoustics, Speech, and Signal Processing
- 1990
A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) into a hidden Markov model (HMM) approach is described, which appears to be somewhat better when MLP methods are used to estimate the probabilities.
The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition
- Computer ScienceInternational Conference on Acoustics, Speech, and Signal Processing
- 1990
The Mega-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier and the Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and varying speaker dialects.
Connectionist Viterbi training: a new hybrid method for continuous speech recognition
- Computer ScienceInternational Conference on Acoustics, Speech, and Signal Processing
- 1990
A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented. CVT can be run…
Robust connectionist parsing of spoken language
- Computer Science, LinguisticsInternational Conference on Acoustics, Speech, and Signal Processing
- 1990
A modular, recurrent connectionist network architecture which learns to robustly perform incremental parsing of complex sentences is presented and generalize and display tolerance to input which has been corrupted in ways common in spoken language.