Kiyohiro Shikano

Learn More
In this paper we present a Time-Delay Neural Network (TDNN) approach to phoneme recognition which is characterized by two important properties. 1) Using a 3 layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically(More)
Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost realtime decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped(More)
We propose a new Single-Input Multiple-Output (SIMO)-modelbased ICA with information-geometric learning algorithm for highfidelity blind source separation. The SIMO-ICA consists of multiple ICAs and a fidelity controller, and each ICA runs in parallel under the fidelity control of the entire separation system. The SIMOICA can separate the mixed signals, not(More)
We describe a new method of blind source separation (BSS) on a microphone array combining subband independent component analysis (ICA) and beamforming. The proposed array system consists of the following three sections: (1) subband ICA-based BSS section with estimation of the direction of arrival (DOA) of the sound source, (2) null beamforming section based(More)
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM) applied to STRAIGHT, quality of converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we propose the GMM-based algorithm with dynamic frequency warping to avoid the over-smoothing. We also propose an addition of the weighted residual(More)
Speech intelligibility can be improved by adding lip images to the speech signal. Thus lip movement synthesis plays an important role to realize a natural human-like face of computer agents. This paper proposes a novel, lip movement synthesis method from speech input based on the Hidden Markov Models (HMMs). The diculties of lip movement synthesis are(More)
We propose a new blind spatial subtraction array (BSSA) consisting of a noise estimator based on independent component analysis (ICA) for efficient speech enhancement. In this paper, first, we theoretically and experimentally point out that ICA is proficient in noise estimation under a non-point-source noise condition rather than in speech estimation.(More)
Voice conversion is a technique for producing utterances using any target speakers’ voice from a single source speaker’s utterance. In this paper, we apply cross-language voice conversion between Japanese and English to a system based on a Gaussian Mixture Model (GMM) method and STRAIGHT, a high quality vocoder. To investigate the effects of this conversion(More)
We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the kbest phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned(More)