Learn More
This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. DBNs have a deep architecture that automatically discovers abstractions to maximally express the original(More)
We are studying about automatic production of soccer sports videos for easy understanding by using digital camera work on camera fixed videos. The digital camera work is a movie technique which uses virtual panning and zooming by clipping frames from hi-resolution images and controlling the frame size and position. We have studied so far digital panning. In(More)
This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In handsfree speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by the reflection energy of the preceding segment. To compensate for the reflection signal we introduce a(More)
This paper proposes a method to automatically extract highlight scenes from sports (baseball) live video in real time and to allow users to retrieve them. For this purpose, sophisticated speech recognition is employed to convert the speech signal into the text and to extract a group of keywords in real time. Image processing detects, also in real time, the(More)
We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis) for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image [1]. Much research for robust speech feature(More)
We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of the first words spoken tends to be unstable due to the strain placed on the speech-related muscles, and this causes degradation of speech recognition. Therefore, we proposed a robust feature extraction method based on(More)
This paper proposes a novel feature extraction method for speech recognition based on gradient features on a 2-D time-frequency matrix. Widely used MFCC features lack temporal dynamics. In addition, ΔMFCC is an indirect expression of temporal frequency changes. To extract the temporal dynamics more directly, we propose local gradient features in an(More)
We investigate a robust speech feature extraction method using kernel PCA (principal component analysis). Kernel PCA has been suggested for various image processing tasks requiring an image model such as, e.g., denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but(More)
This paper proposes a method to classify automatically TV sports news articles using image processing and classification techniques. The classification algorithm of TV sports news articles is based on a multiple subspace method that provides a sports category with more than one subspaces corresponding to the typical scenes. The classification is performed(More)
In this paper, a new decoding method for unsupervised acoustic model adaptation is presented. In unsupervised adaptation framework, the effectiveness of adaptation process is greatly affected by the mis-recognized labels. Therefore, selection of the adaptation data guided by the confidence measures is effective in unsupervised adaptation. We propose phoneme(More)