Viet-Anh Tran

Learn More
Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor(More)
Acoustic speaker diarization is investigated for situations where a collection of shows from the same source needs to be processed. In this case, the same speaker should receive the same label across all shows. We compare different architectures for cross-show speaker diarization: the obvious concatenation of all shows, a hybrid system combining first a(More)
The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F 0 of the transformed voice from unvoiced speech. In this paper, we(More)
Although the segmental intelligibility of converted speech from silent speech using direct signal-to-signal mapping proposed by Toda et al. [1] is quite acceptable, listeners have sometimes difficulty in chunking the speech continuum into meaningful words due to incomplete phonetic cues provided by output signals. This paper studies another approach(More)
Non-audible murmur (NAM) is an unvoiced speech received through body tissue using special acoustic sensors (i.e., NAM microphones) attached behind the talkers ear. Although NAM has different frequency characteristics compared to normal speech, it is possible to perform automatic speech recognition (ASR) using conventional methods. In using a NAM microphone,(More)
Two speech inversion methods are implemented and compared. In the first, multistream Hidden Markov Models (HMMs) of phonemes are jointly trained from synchronous streams of articulatory data acquired by EMA and speech spectral parameters; an acoustic recognition system uses the acoustic part of the HMMs to deliver a phoneme chain and the states durations;(More)
RÉSUMÉ Afin de récupérer les mouvements des articulateurs tels que les lèvres, la mâchoire ou la langue, à partir du son de parole, nous avons développé et comparé deux méthodes d'inversion basées l'une sur les modèles de Markov cachés (HMMs) et l'autre sur les modèles de mélanges de gaussiennes (GMMs). Les mouvements des articulateurs sont caractérisés par(More)
  • 1