Learn More
We previously have applied deep autoencoder (DAE) for noise reduction and speech enhancement. However, the DAE was trained using only clean speech. In this study, by using noisyclean training pairs, we further introduce a denoising process in learning the DAE. In training the DAE, we still adopt greedy layer-wised pretraining plus fine tuning strategy. In(More)
We propose a rescoring framework for speech recognition that incorporates acoustic phonetic knowledge sources. The scores corresponding to all knowledge sources are generated from a collection of neural network based classifiers. Rescoring is then performed by combining different knowledge scores and uses them to reorder candidate strings provided by(More)
In order to incorporate long temporal-frequency structure for acoustic event detection, we have proposed a spectral patch based learning and representation method. The learned spectral patches were regarded as acoustic words which were further used in sparse encoding for acoustic feature representation and modeling. In our previous study, during feature(More)
We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and(More)
Acoustic event detection is an important step for audio content analysis and retrieval. Traditional detection techniques model the acoustic events on frame-based spectral features. Considering the temporal-frequency structures of acoustic events may be distributed in time-scales beyond frames, we propose to represent those structures as a bag of spectral(More)
Denoising autoencoder (DAE) is effective in restoring clean speech from noisy observations. In addition, it is easy to be stacked to a deep denoising autoencoder (DDAE) architecture to further improve the performance. In most studies, it is supposed that the DAE or DDAE can learn any complex transform functions to approximate the transform relation between(More)
Speech recognition has become an important feature in smartphones in recent years. Different from traditional automatic speech recognition, the speech recognition on smartphones can take advantage of personalized language models to model the linguistic patterns and wording habits of a particular smartphone owner better. Owing to the popularity of social(More)
Received signal strength (RSS) in Wi-Fi networks is commonly employed in indoor positioning systems; however, device diversity is a fundamental problem in such RSS-based systems. The variation in hardware is inevitable in the real world due to the tremendous growth in recent years of new Wi-Fi devices, such as iPhones, iPads, and Android devices, which is(More)
We propose an ensemble modeling framework to jointly characterize speaker and speaking environments for robust speech recognition. We represent a particular environment by a super-vector formed by concatenating the entire set of mean vectors of the Gaussian mixture components in its corresponding hidden Markov model set. In the training phase we generate an(More)