Learn More
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. This paper proposes a statistical-model-based noise-robust VAD algorithm using long-term temporal information and harmonic-structure-based features in speech. Long-term temporal information has recently become an ASR focus, but has not yet been deeply(More)
Techniques for estimating gaze without restricting user movements are highly desired for their potential applications. Although commercial gaze estimation systems achieve high accuracy using infrared light, gaze estimation systems with webcams have become indispensable because of their low price. The problem using webcams is that their resolution is too low(More)
A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed in [1] and showed more robust over the conventional ZCPA [2]. In this paper, we examine the effect of auditory masking, both simultaneous and temporal, into the proposed PS-ZCPA method. We also observe the effect of varying the number of(More)
Acoustic models (AMs) of an HMM-based classifier include various types of hidden variables such as gender type, speaking rate, and acoustic environment. If there exists a canonicalization process that reduces the influence of the hidden variables from the AMs, a robust automatic speech recognition (ASR) system can be realized. In this paper, we describe the(More)