Gerasimos Xydas

Learn More
Electronic texts carry important meta-information (such as tags in HTML) that most of the current Text-to-Speech (TtS) systems ignore during the production of the speech. We propose an approach to exploit this meta-information in order to achieve a detailed auditory representation of an e-text. The e-Text to Speech and Audio (e-TSA) Composer has been(More)
Text documents usually embody visually oriented meta-information in the form of complex visual structures, such as tables. The semantics involved in such objects result in poor and ambiguous text-to-speech synthesis. Although most speech synthesis frameworks allow the consistent control of an abundance of parameters, such as prosodic cues, through(More)
There are two issues that are challenging in the life-cycle of Digital Talking Books (DTB): the automatic labeling of text formatting meta-data in documents and the multimodal representation of the text formatting semantics. We propose an augmented design-for-all approach for both the production and the reading processes of DAISY compliant DTBs. This(More)
In this paper we present a novel approach, called “Text to Pronunciation (TtP)”, for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodology deals with inflection issues for the consistency of the NSWs with the syntactic structure of the utterances they belong to. Moreover, for the achievement of an augmented auditory(More)
The prosodic specification of an utterance to be spoken by a Textto-Speech synthesis system can be devised in break indices, pitch accents and boundary tones. In particular, the identification of break indices formulates the intonational phrase breaks that affect all the forthcoming prosody-related procedures. In the present paper we use tree-structured(More)
In this paper we present the design and development of a modular and scalable speech composer named DEMOSTHeNES. It has been designed for converting plain or formatted text (e.g. HMTL) to a combination of speech and audio signals. DEMOSTHeNES' architecture constitutes an extension to current Text-to-Speech systems’ structure that enables an open set of(More)
The auditory formation of visual-oriented documents is a process that enables the delivery of a more representative acoustic image of documents via speech interfaces. We have set up an experimental environment for conducting a series of complex psycho-acoustic experiments to evaluate users’ performance in recognizing synthesized auditory components that(More)
The acoustic representation of complex visual structures involves both synthesized speech and non-speech audio signals. Though progress in speech synthesis allows the consistent control of an abundance of parameters, like prosody through appropriate mark-up, there is not enough experimentally proven specification input data to drive a Voice Browser for such(More)
Emerging electronic text formats include hierarchical structure and visualization related information that current Text-to-Speech (TtS) systems ignore. In this paper we present a novel approach for composing detailed auditory representation of e-texts using speech and audio. Furthermore, we provide a scripting language (CAD scripts) for defining specific(More)