Reconstruction techniques for improving the perceptual quality of binary masked speech.


This study proposes an approach to improve the perceptual quality of speech separated by binary masking through the use of reconstruction in the time-frequency domain. Non-negative matrix factorization and sparse reconstruction approaches are investigated, both using a linear combination of basis vectors to represent a signal. In this approach, the short-time Fourier transform (STFT) of separated speech is represented as a linear combination of STFTs from a clean speech dictionary. Binary masking for separation is performed using deep neural networks or Bayesian classifiers. The perceptual evaluation of speech quality, which is a standard objective speech quality measure, is used to evaluate the performance of the proposed approach. The results show that the proposed techniques improve the perceptual quality of binary masked speech, and outperform traditional time-frequency reconstruction approaches.

DOI: 10.1121/1.4884759

Extracted Key Phrases

9 Figures and Tables

Citations per Year

Citation Velocity: 8

Averaging 8 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Williamson2014ReconstructionTF, title={Reconstruction techniques for improving the perceptual quality of binary masked speech.}, author={Donald S. Williamson and Yuxuan Wang and DeLiang Wang}, journal={The Journal of the Acoustical Society of America}, year={2014}, volume={136 2}, pages={892-902} }