Stefan Steidl

Learn More
Most paralinguistic analysis tasks are lacking agreed-upon evaluation procedures and comparability, in contrast to more ‘traditional’ disciplines in speech analysis. The INTERSPEECH 2010 Paralinguistic Challenge shall help overcome the usually low compatibility of results, by addressing three selected subchallenges. In the Age Sub-Challenge, the age of(More)
The INTERSPEECH 2012 Speaker Trait Challenge provides for the first time a unified test-bed for ‘perceived’ speaker traits: Personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In this paper, we describe these three Sub-Challenges, Challenge conditions, baselines, and a new feature set by(More)
More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its ‘big brothers’ speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive(More)
The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall(More)
While the first open comparative challenges in the field of paralinguistics targeted more ‘conventional’ phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. The INTERSPEECH 2011 Speaker State Challenge thus addresses two new sub-challenges to overcome the usually(More)
The INTERSPEECH 2015 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: the estimation of the degree of nativeness, the neurological state of patients with Parkinson’s condition, and the eating conditions of speakers, i. e., whether and which food type they are(More)
In traditional classification problems, the reference needed for training a classifier is given and considered to be absolutely correct. However, this does not apply to all tasks. In emotion recognition in non-acted speech, for instance, one often does not know which emotion was really intended by the speaker. Hence, the data is annotated by a group of(More)
Classification performance of emotional user states found in realistic, spontaneous speech is not very high, compared to the performance reported for acted speech in the literature. This might be partly due to the difficulty of providing reliable annotations, partly due to suboptimal feature vectors used for classification, and partly due to the difficulty(More)