Auditory Effects for ASR
@inproceedings{Lyon1996AuditoryEF, title={Auditory Effects for ASR}, author={R. Lyon}, year={1996} }
Almost all ASR front ends use an amplitude-independent representation of spectral shape as the primary feature vector, obtained via some combination of normalization, logarithms, or AR modeling. They also typically represent total power or loudness as a separate feature. These ideas are fine to first order, and have gotten ASR to where it is today. But they totally punt on the issue of what is "loud enough".
2 Citations
Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
- Computer Science
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2014
- 12
References
SHOWING 1-10 OF 17 REFERENCES
Self-normalization and noise-robustness in early auditory representations
- Computer Science
- IEEE Trans. Speech Audio Process.
- 1994
- 123
A computational model of filtering, detection, and compression in the cochlea
- Computer Science
- ICASSP
- 1982
- 397
- PDF
Correlograms and the Separation of Sounds
- Computer Science
- 1990 Conference Record Twenty-Fourth Asilomar Conference on Signals, Systems and Computers, 1990.
- 1990
- 43
- PDF
A theory and computational model of auditory monaural sound separation
- Psychology, Computer Science
- 1985
- 140
Experiments in isolated digit recognition with a cochlear model
- Computer Science
- ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing
- 1987
- 10
- PDF