Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

@article{Baker2009DevelopmentsAD,
  title={Developments and directions in speech recognition and understanding, Part 1 [DSP Education]},
  author={J. Baker and Li Deng and James R. Glass and Sanjeev Khudanpur and Chin-Hui Lee and Nelson Morgan and Douglas D. O'Shaughnessy},
  journal={IEEE Signal Processing Magazine},
  year={2009},
  volume={26}
}
To advance research, it is important to identify promising future research directions, especially those that have not been adequately pursued or funded in the past. The working group producing this article was charged to elicit from the human language technology (HLT) community a set of well-considered directions or rich areas for future research that could lead to major paradigm shifts in the field of automatic speech recognition (ASR) and understanding. ASR has been an area of great interest… Expand
Deep learning: from speech recognition to language and multimodal processing
  • L. Deng
  • Computer Science
  • APSIPA Transactions on Signal and Information Processing
  • 2016
TLDR
The historical path to this transformative success of deep learning in speech recognition is reflected, and a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language. Expand
A comparative study of state-of-the-art speech recognition models for English and Dutch
TLDR
It can be deduced that the size of the dataset is influential on the accuracy of speech recognition systems and the listen, attend and spell model on both English and Dutch datasets outperforms the CNN-BLSTM model. Expand
Automatic Speech Recognition using limited vocabulary: A survey
TLDR
A comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary is provided. Expand
Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances
TLDR
The aim of this article is to describe some of the technological underpinnings of modern LVCSR systems, which are not robust to mismatched training and test conditions and cannot handle context as well as human listeners despite being trained on thousands of hours of speech and billions of words of text. Expand
Machine Learning in Automatic Speech Recognition: A Survey
TLDR
A comprehensive review of common machine learning techniques like artificial neural networks, support vector machines, and Gaussian mixture models along with hidden Markov models employed in ASR is provided. Expand
Machine Learning Paradigms for Speech Recognition: An Overview
  • L. Deng, Xiao Li
  • Computer Science
  • IEEE Transactions on Audio, Speech, and Language Processing
  • 2013
TLDR
This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems, and presents and analyzes recent developments of deep learning and learning with sparse representations. Expand
A HYBRID ARCHITECTURE FOR RECOGNISING SPEECH SIGNALS IN MALAYALAM
TLDR
The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam using two feature techniques which produced the best recognition accuracy called Discrete Wavelet Transforms and Wavelet Packet Decomposition. Expand
Spoken Language Processing: Where Do We Go from Here?
TLDR
This chapter shows how the growing evidence for an intimate relationship between sensor and motor behaviour in living organisms, the power of negative feedback control to accommodate unpredictable disturbances in real-world environments, and hierarchical models of temporal memory point towards a novel architecture for speech-based human-machine interaction. Expand
Toward growing modular deep neural networks for continuous speech recognition
TLDR
A growing modular deep neural network for speech recognition is introduced that is pre-trained in a special manner to implement spatiotemporal information of the frame sequences at the input and their labels at the output layer at the same time. Expand
Stacked transformations for foreign accented speech recognition
TLDR
Novelty in this work is the stack wise combination of multiple different adaptation transformations that have a better fit for the recognition utterances, called Stacked Transformations. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Spoken Language Digital Libraries : The Million Hour Speech Project
The Center for Innovations in Speech and Language (CISL) at Carnegie Mellon University has launched a grand challenge project to collect and annotate at least one million hours of recorded speech. ToExpand
Automatic Speech and Speaker Recognition: Advanced Topics
TLDR
Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks. Expand
RASTA processing of speech
TLDR
The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement. Expand
fMPE: discriminatively trained features for speech recognition
MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with aExpand
Two-channel speech analysis
TLDR
It is shown how the EGG can be used as a tool for validating speech processing algorithms and estimating possible lower bounds for both computation and performance of these algorithms, particularly closed-phase speech analysis. Expand
Speech and language processing - an introduction to natural language processing, computational linguistics, and speech recognition
TLDR
This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Expand
An introduction to computing with neural nets
TLDR
This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Expand
An introduction to computing with neural nets
TLDR
This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Expand
Rapid speaker adaptation in eigenvoice space
TLDR
A new model-based speaker adaptation algorithm called the eigenvoice approach, which constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. Expand
Perceptual linear predictive (PLP) analysis of speech.
  • H. Hermansky
  • Computer Science, Medicine
  • The Journal of the Acoustical Society of America
  • 1990
TLDR
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech. Expand
...
1
2
3
4
5
...