Learn More
Models of dialog state are important, both scientifically and practically, but today's best build strongly on tradition. This paper presents a new way to identify the important dimensions of dialog state, more bottom-up and empirical than previous approaches. Specifically, we applied Principal Component Analysis to a large number of low-level prosodic(More)
People in dialog use a rich set of nonverbal behaviors, including variations in the prosody of their utterances. Such behaviors, often emotion-related, call for appropriate responses, but today's spoken dialog systems lack the ability to do this. Recent work has shown how to recognize user emotions from prosody and how to express system-side emotions with(More)
—In spoken dialog, speakers are simultaneously engaged in various mental processes, and it seems likely that the word that will be said next depends, to some extent, on the states of these mental processes. Further, these states can be inferred, to some extent, from properties of the speaker's voice as they change from moment to moment. As a illustration of(More)
As a priority-setting exercise, we compared interactions between users and a simple spoken dialog system to interactions between users and a human operator. We observed usability events, places in which system behavior differed from human behavior, and for each we noted the impact, root causes, and prospects for improvement. We suggest some priority issues(More)
Today there are solutions for some specific turn-taking problems , but no general model. We show how turn-taking can be reduced to two more general problems, prediction and selection. We also discuss the value of predicting not only future speech/silence but also prosodic features, thereby handing not only turn-taking but " turn-shaping ". To illustrate how(More)
Discovering and quantifying the prosodic signals that help manage turn-taking is difficult, in part because of the limitations of commonly used methods. This paper presents an integrated method that uses both perceptually-based analysis and quantitative analysis. The eight activities involved in the method — clarification of aims, problem formulation,(More)
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities(More)
Inspired by the goal of modeling the dialog state and the speaker's mental state, moment by moment, we apply Principal Component Analysis to a vector of 76 prosodic features spanning 6 seconds of context. This gives a mul-tidimensional representation of the current state. We find that word probabilities vary strongly with several of these dimensions, that(More)
Most language models treat speech as simply sequences of words, ignoring the fact that words are also events in time. This paper reports an initial exploration of how word probabilities vary with time-into-utterance, and proposes a method for using this information to improve a language model. This is done by computing the ratio of the probability of the(More)
Spoken dialog systems today do not vary the prosody of their utterances, although prosody is known to have many useful expressive functions. In a corpus of memory quizzes, we identify eleven dimensions of prosodic variation, each with its own expressive function. We identified the situations in which each was used, and developed rules for detecting these(More)