Sinusoidal modeling for nonstationary voiced speech based on a local vector transform.


A voiced speech signal can be expressed as a sum of sinusoidal components of which instantaneous frequency and amplitude continuously vary with time. Determining these parameters from the input, the time-varying characteristics are crucial error sources for the algorithms, which assume their stationarity within a local analysis segment. To overcome this problem, a new method is proposed, local vector transform (LVT), which can determine instantaneous frequency and amplitude for nonstationary sinusoids. The method does not assume the local stationarity. The effectiveness of LVT was examined in parameter determination for synthesized and naturally uttered speech signals. The instantaneous frequency for the first harmonic component was determined with an accuracy almost equal to that of the time-corrected instantaneous frequency method and higher accuracy than that of spectral peak-picking, autocorrelation, and cepstrum. The instantaneous amplitude was also determined accurately by LVT while considerable errors were left in the other algorithms. The signal reconstructed from the determined parameters by LVT agreed well with the corresponding component of voiced speech. These results suggest that the method is effective for analyzing time-varying voiced speech signals.

Cite this paper

@article{Ito2007SinusoidalMF, title={Sinusoidal modeling for nonstationary voiced speech based on a local vector transform.}, author={Masashi Ito and Masafumi Yano}, journal={The Journal of the Acoustical Society of America}, year={2007}, volume={121 3}, pages={1717-27} }