Bertil Lyberg

Learn More
Today synthetic speech is often based on concatenation of natural speech, i.e. units such as diphones or polyphones are taken from natural speech and are then put together to form any word or sentence [5]. So far there have mainly been two ways of adding a visual modality to such a synthesis: Morphing between single images or concatenating video sequences.(More)
Most spoken language translation systems developed to date rely on a pipelined architecture, in which the main stages are speech recognition, linguistic analysis, transfer, generation and speech synthesis. When making projections of error rates for systems of this kind, it is natural to assume that the error rates for the individual components are(More)
This paper I describes a speech to speech translation system using standard components and a suite of generalizable customization techniques. The system currently translates air travel planning queries from English to Swedish. The modulax architecture is designed to be easy to port to new domains and languages, and consists of a pipelined series of(More)
We describe the architecture of the Spoken Language Translator (SLT), a prototype speech translation system which can translate queries from spoken English to spoken Swedish in the domain of air travel information systems. Though the performance given the level of effort so far has been extremely encouraging, more work is needed to provide a technology that(More)
In acoustic and visual synthesis based on concatenation of speech units such as demisyllables, the recording of these units is normally taken from nonsense utterances where the demisyllable in question is pronounced in a non-focal position. In the present investigation, the relation between the lip movements in focal and non-focal position is studied and a(More)
Speech is normally accompanied or supplemented with different gestures such as eyebrow movements and head movements. These movements seem to be of great importance in face-to-face communication. In this study we were studying the visual correlates to focal accent in read speech. We were especially interested in the timing of the non-verbal events in(More)