Mircea Giurgiu

Learn More
The success of the collaborative web-based MediaWiki platform, widely used in several projects to exchange knowledge created a new idea to use this system as a low-tech interoperability and repository layer for data providers, end users, developers and project partners. Facilitating the acquisition of knowledge for multimedia digital resources is a task(More)
Simple4All Tundra (version 1.0) is the first release of a standardised multilingual corpus designed for text-to-speech research with imperfect or found data. The corpus consists of approximately 60 hours of speech data from audiobooks in 14 languages, as well as utterance-level alignments obtained with a lightly-supervised process. Future versions of the(More)
This paper presents techniques for building text-to-speech front-ends in a way that avoids the need for language-specific expert knowledge, but instead relies on universal resources (such as the Unicode character database) and unsupervised learning from unannotated data to ease system development. The acquisition of expert language-specific knowledge and(More)
— This research assesses the ability of a Hidden Markov Model (HMM) based method to generate an accurate and reliable automatic phone-level transcriptions for a small vocabulary speech corpus. In particular, we are interested in a system that requires only orthographic transcription of the target corpus, and can be bootstrapped from models trained on an(More)
This paper reports on a multilingual investigation into the effects of different masker types on native and non-native perception in a VCV consonant recognition task. Native listeners outperformed 7 other language groups, but all groups showed a similar ranking of maskers. Strong first language (L1) interference was observed, both from the sound system and(More)
The use of shared projection neural nets of the sort used in language modelling is proposed as a way of sharing parameters between multiple text-to-speech system components. We experiment with pretraining the weights of such a shared projection on an auxiliary language modelling task and then apply the resulting word representations to the task of(More)
A speech corpus is available in Romanian to use as the common material in speech perception and automatic speech recognition. It consists of high-quality audio of 400 sentences spoken by each of 12 speakers. Utterances are simple, syntactically identical phrases such as " muta bronz cu p 2 agale. " Preliminary intelligibility tests using the audio signals(More)
In this paper we evaluate two approaches for predicting the sentiment polarity of an utterance. The first method is based on a 3-dimensional model which takes into account text expressiveness in terms of valence, arousal and dominance. The second one determines the word's semantic orientation according to Chi-square and Relevance factor statistic metrics.(More)