Sayaka Shiota

Learn More
In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application(More)
This paper proposes a cross-lingual speaker adaptation (CLSA) method based on perceptual characteristics (PCs). To develop a CLSA system, a state mapping (SM) based method has been recently proposed. This method extracts speaker characteristics directly from source acoustic features as linear transforms and applies them to target models. However, it is(More)
This paper provides an overview of speaker adaptation research carried out in the EMIME speech-to-speech translation (S2ST) project. We focus on how speaker adaptation transforms can be learned from speech in one language and applied to the acoustic models of another language. The adaptation is transferred across languages and/or from recognition models to(More)
This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV systems have reached equivalent performances equivalent to those of other biometric modalities. However, spoofing techniques against these systems have also progressed drastically.(More)
This paper proposes an HMM training technique using multiple phonetic decision trees and evaluates it in speech recognition. In the use of context dependent models, the decision tree based context clustering is applied to find a parameter tying structure. However, the clustering is usually performed based on statistics of HMM state sequences which are(More)
This paper proposes a deterministic annealing based training algorithm for Bayesian speech recognition. The Bayesian method is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters. However, the local maxima problem in the Bayesian method is more serious than in the ML-based approach, because the Bayesian(More)
This paper presents an algorithm for detecting spoofing attacks against automatic speaker verification (ASV) systems. While such systems now have performances comparable to those of other biometric modalities, spoofing techniques used against them have progressed drastically. Several techniques can be used to generate spoofing materials (e.g., speech(More)