The paper presents a pitch estimation technique based on the robust algorithm for pitch tracking (RAPT) framework. The proposed solution provides estimation of instantaneous pitch values and is not sensitive to rapid frequency modulations. The technique utilizes a different period candidate generating function based on instantaneous harmonic parameters. The(More)
This paper presents some improvements to the known structure of the synthesis part of near-PR non-uniform all-pass transformed DFT filter bank. The resulting systems are functionally equivalent to the base schema but their complexity is considerably reduced in terms of computations and storage. Moreover, their mathematical descriptions lead to alternative,(More)
This paper addresses the problem of noise estimation for the Karhunen-Loeve transform (KLT) based speech enhancement. The eigenvalues and eigenvectors of the noise covariance matrix are tracked using recursive averaging algorithm. This process is controlled by the noise power minima obtained from the noisy signal even during the speech activity periods. The(More)
This paper introduces a framework for parametric speech modeling that can be used in various speech applications such as text-to-speech synthesis, voice conversion etc. In order to reduce impact of pitch variations the harmonic analysis is done in the warped time scale that is aligned with instantaneous pitch values. It is assumed that each harmonic has its(More)
A novel approach to the design and implementation of four-channel paraunitary filter banks is presented. It utilizes hypercomplex number theory, which has not yet been employed in these areas. Namely, quaternion multipliers are presented as alternative pa-raunitary building blocks, which can be regarded as generalizations of Givens (planar) rotations. The(More)
We consider the warped DFT as an alternative basis for psychoacoustic models. The appropriate construction of the transform is approached, aiming for precise critical band power analysis. It is shown that sufficient spectral resolution can be obtained for sample block lengths several times shorter than the 1024 or 2048 commonly used in the FFT based ear(More)
Speech recognition engines should remain reasonably accurate in adverse environments in order to find their ways from laboratories towards applications. However the human auditory system has been proven to be a versatile tool, which is capable of outperforming the known artificial algorithms in their target environments. Recent advances in psychoacoustics(More)