Librispeech: An ASR corpus based on public domain audio books
In this paper, we investigate the applicability and effectiveness of advanced feature compensation techniques in devising a robust front-end for Automatic Speech Recognition (ASR). First, the Vector Taylor Series (VTS) equations are altered by bringing in the auditory masking factor. The resultant VTS approximation is used to compensate the parameters of a clean speech model and a Minimum Mean Square Error (MMSE) estimate is used to estimate the clean speech features from noisy features. Second, we apply root-compression instead of conventional log-compression to the mel-filter banks energy. Third, we apply a frame selection method to eliminate the noise dominated frames to improve the performance in high noise scenarios. The proposed algorithms are validated on noise corrupted Librispeech and TIMIT speech recognition databases and are shown to provide significant gain in performance.