Learn More
Recently the importance of hands-free speech interfaces is increasingly recognized. However, in real environments, the presence of ambient noises and room reverberations seriously degrades the performance of the hands-free speech recognition. Reliable sound source localization is necessary to maximize the effect of noise reduction. This paper proposes a new(More)
Ambient noises and room reverberations seriously degrade the accuracy of speaker localization in real environments. In this paper we describe the localization of a speaker supposing that the speaker utters under a disturbance noise with large amplitude. Our algorithm localizes a sound source by finding the position that maximizes the accumulated correlation(More)
In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a(More)
This paper describes improvement of the STD method which is based on the vector quantization (VQ). Spoken documents are represented as sequences of VQ codes, and they are matched with a text query to be detected based on the V-P score which measures the relationship between a VQ code and a phoneme. The matching score between VQ codes and phonemes is(More)
This paper describes a novel method for multiple sound source localization and its performance evaluation in actual room environments. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of(More)
Speech recognition requires methods for calculating the acoustic likelihood of a speech sample using acoustic models that represent the spectrum diversity due to interspeaker variation. The acoustic features of a phoneme may vary over a large range between speakers; however, the relative relationships between phonemes are known to exhibit strong(More)
This paper describes a method of automatic labeling of prosodic information focusing on accent types and accent phrase boundaries for Japanese spoken sentences. They are predicted by CRF (Conditional Random Fields) using linguistic information and F0 contour information. In the prediction of the accent type, we propose a method that uses a provisional(More)
  • 1