Thomas Hueber

Learn More
The possibility of speech processing in the absence of an intelligible acoustic signal has given rise to the idea of a ‘silent speech’ interface, to be used as an aid for the speech-handicapped, or as part of a communications system operating in silence-required or high-background-noise environments. The article first outlines the emergence of the silent(More)
This article addresses synchronous acquisition of high-speed multimodal speech data, composed of ultrasound and optical images of the vocal tract together with the acoustic speech signal, for a silent speech interface. Built around a laptop-based portable ultrasound machine (Terason T3000) and an industrial camera, an acquisition setup is described together(More)
The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were(More)
The article presents the results of tests of a portable post-laryngectomy voice replacement system that allows a silently articulating speaker to select and play back short phrases contained in a 60-phrase phrasebook. Such a system could be a useful communication tool for post-laryngectomy patients unable to use tracheo-oesophageal speech. Experiments on(More)
The article compares two approaches to the description of ultrasound vocal tract images for application in a "silent speech interface," one based on tongue contour modeling, and a second, global coding approach in which images are projected onto a feature space of Eigentongues. A curvature-based lip profile feature extraction method is also presented.(More)
Expressive speech is a useful tool in cinema, theater and contemporary music. In this paper we present a study on the influence of expressivity on the speech rates of a French actor. It involves a relational database containing expressive and neutral spoken French. We first describe the analysis partly based on a unit-selection Text-to-Speech system. The(More)
Orofacial clones can display speech articulation in an augmented mode, i.e. display all major speech articulators, including those usually hidden such as the tongue or the velum. Besides, a number of studies tend to show that the visual articulatory feedback provided by ElectroPalatoGraphy or ultrasound echography is useful for speech therapy. This paper(More)
In this paper, we present recent developments on the HMMbased acoustic-to-articulatory inversion approch that we develop for a “visual articulatory feedback” system. In this approach, multi-stream phoneme HMMs are trained jointly on synchronous streams of acoustic and articulatory data, acquired by electromagnetic articulography (EMA).(More)
This paper presents recent developments on our “silent speech interface” that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate here the use(More)