Learn More
This paper describes a strategy for the conversation system to take part in human-to-human group conversation. One big characteristic of the group conversation system is that it can choose whether to observe or to take turn in the conversation. We implement the computational model combined with speech and gaze recognizers to keep the rules in turn taking,(More)
In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining(More)
This paper describes a robot who converses with multi-person using his multi-modal interface. The multi-person conversation includes many new problems, which are not cared in the conventional one-to-one conversation: such as information flow problems (recognizing who is speaking and to whom he is speaking / appealing to whom the system is speaking), space(More)
In this paper, we describe three singing information processing systems , VocaListener, VocaListener2, and VocaWatcher, that imitate singing expressions of the voice and face of a human singer. VocaL-istener can synthesize natural singing voices by analyzing and imitating the pitch and dynamics of the human singing. VocaListener2 imitates temporal timbre(More)
— In this paper, we describe VocaWatcher, a novel robot motion generator that enables a humanoid robot to sing with realistic facial expressions and naturally synthesized singing voices. This robot singer is an important and attractive humanoid robot application for the entertainment scene; moreover, it promotes state-of-the-art integration of robot(More)
This paper reports on automatic prediction of dialog acts and address types in three-party conversations. In multi-party interaction, dialog structure becomes more complex compared to one-to-one case, because there is more than one hearer for an utterance. To cope with this problem, we predict dialog acts and address types simultaneously on our framework.(More)