1.1 What is Multi-Band Processing?........................ 7 1.2 Motivation for the Multi-Band Paradigm................... 8

    Abstract

    Multi-band approaches have recently generated a great deal of interest in the automatic speech recognition (ASR) community. In this paradigm, each sub-frequency region of the speech signal is treated as a distinct source of information and the streams are combined after each is processed independently. Motivations for the multi-band paradigm include results from psycho-acoustic studies, robustness to noise, and potential for parallel processing. The main contribution of this dissertation is the systematic exploration of an area of great interest to many in the research community, showing that multi-band ASR is a viable option, not just for improving recognition accuracy in the presence of noise, but also for clean speech. The work focused on the design and implementation of a multi-band system, analysis of some of its characteristics, and development of extensions to the paradigm. An analysis in terms of phonetic feature transmission showed multi-band processing to be better than a comparable traditional full-band design in many cases. It was observed that some bands were more accurate in discriminating between some phonetic categories. It was hypothesized that combining the confused sub-band classes This report is a revised version of the author's thesis, which was submitted to the Department of Electrical Engineering and Computer Science on November 24, 1998 in partial ful llment of the requirements for the degree of Doctor of Philosophy at the University of California, Berkeley. This work was supervised by Professor Nelson Morgan. The thesis committee also included Professors Steven Greenberg, Jitendra Malik, and John Ohala. would reduce the number of input classes and improve generalization. The size of the input space was reduced by almost 30%, and yet the global frame-level phonetic discrimination improved and the word recognition error did not change (the observed improvement was not statistically signi cant). The results were consistent with the original hypothesis. The analysis also showed that the phonetic transitions in the sub-bands do not necessarily occur synchronously and are a ected by conditions such as speaking rate and room reverberation. Relaxing the synchrony constraints in the sub-bands during word recognition was investigated. The experimental results suggested that removing the synchrony constraints for all phone to phone transitions is unlikely to be advantageous while signi cantly increasing computational cost. The combination of the multi-band and the full-band system was studied. This combination reduced the word recognition error rate for the experimental clean speech task by about 23-29% compared to the baseline system. The results obtained are the best that we know of on the Numbers95 experimental database.

    69 Figures and Tables

    Cite this paper

    @inproceedings{11WI, title={1.1 What is Multi-Band Processing?........................ 7 1.2 Motivation for the Multi-Band Paradigm................... 8}, author={} }