Learn More
We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.
We introduce a discriminative hidden-state approach for the recognition of human gestures. Gesture sequences often have a complex underlying structure, and models that can incorporate hidden structures have proven to be advantageous for recognition tasks. Most existing approaches to gesture recognition with hidden states employ a Hidden Markov Model or(More)
This paper presents NetICE, Networked Intelligent Collaborative Environment, which is a prototype designated to provide an immersive environment to allow people to communicate from anywhere at any time. The NetICE system consists of two components: an intelligent network and smart terminals. The intelligent network allows multiple users to be connected to(More)
Speech recognition systems are now used in a wide variety of domains. They have recently been introduced in cars for hand-free control of radio, cell-phone and navigation applications. However, due to the ambient noise in the car recognition errors are relatively frequent. This paper tackles the problem of detecting when such recognition errors occur from(More)
Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision-based body pose estimation and gesture recognition. In interfaces where a user is interacting with a computer using speech and arm gestures, the user's spoken keywords can be(More)
This paper presents a novel method for learning classes of temporal sequences using a bag-of-features approach. We define a temporal sequence as a bag of temporal features and show how this representation can be used for the recognition and segmentation of temporal events. A codebook of temporal descriptors, representing the local temporal texture, is(More)
Automatic detection of communication errors in conversational systems has been explored extensively in the speech community. However, most previous studies have used only acoustic cues. Visual information has also been used by the speech community to improve speech recognition in dialogue systems, but this visual information is only used when the speaker is(More)
  • 1