Learn More
In this paper unsupervised object categorization by robots is examined. We propose an unsupervised multimodal categorization based on audio-visual and haptic information. The robot uses its physical embodiment to grasp and observe an object from various view points as well as listen to the sound during the observation. The proposed categorization method is(More)
This paper describes an algorithm for spoken language acquisition through a human-robot interface based on speech, vision, and behavior. In this algorithm the grounded language knowledge is represented by graphical statistical models consisting of hidden Markov models and stochastic context free grammar. The learning of the lexicon is based on the(More)
This paper proposes a robot that acquires multimodal information, i.e. auditory, visual, and haptic information, fully autonomous way using its embodiment. We also propose an online algorithm of multimodal categorization based on the acquired multimodal information and words, which are partially given by human users. The proposed framework makes it possible(More)
In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce(More)
SUMMARY This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor(More)
In this paper, we propose a nonparametric Bayesian framework for categorizing multimodal sensory signals such as audio, visual, and haptic information by robots. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The multimodal information enables the robot to(More)
This paper proposes a method that generates motions and utterances in an object manipulation dialogue task. The proposed method integrates belief modules for speech, vision, and motions into a probabilistic framework so that a user’s utterances can be understood based on multimodal information. Responses to the utterances are optimized based on an(More)