Perceptual user interfaces: multimodal interfaces that process what comes naturally

@article{Oviatt2000PerceptualUI,
  title={Perceptual user interfaces: multimodal interfaces that process what comes naturally},
  author={Sharon L. Oviatt and Philip R. Cohen},
  journal={Commun. ACM},
  year={2000},
  volume={43},
  pages={45-53}
}
more transparent experience than ever before. Our voice, hands, and entire body, once augmented by sensors such as microphones and cameras, are becoming the ultimate transparent and mobile multimodal input devices. The area of multimodal systems has expanded rapidly during the past five years. Since Bolt’s [1] original “Put That There” concept demonstration, which processed speech and manual pointing during object manipulation, significant achievements have been made in developing more general… 

Figures from this paper

Multimodal Interfaces

  • S. Oviatt
  • Computer Science
    Encyclopedia of Multimedia
  • 2008
TLDR
This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multi-modal interaction and interface design.

Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions

TLDR
The emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner are summarized-including early and late fusion approaches, and the new hybrid symbolic-statistical approach.

Multimodal interaction: A review

  • M. Turk
  • Psychology
    Pattern Recognit. Lett.
  • 2014

Combining Voice and Gesture for Human Computer Interaction

TLDR
The results obtained from the qualitative questionnaire show that the multimodal set is slightly favored by the users with respect to its unimodal counterpart due to better overall performance as well as less cognitive load and effort.

Direct Touch Gaze Input Mid-Air Gestures Proxemics Wearable Speech Input

TLDR
This paper isolates the low-level interaction tasks performed by a user based on a usage scenario for visual exploration, and introduces input modalities that enable interactions directly between a human body and objects of interest on the interface without a mediator in the middle.

Framing the Design Space of Multimodal Mid-Air Gesture and Speech-Based Interaction With Mobile Devices for Older People

TLDR
The aim of this work is to promote the usefulness and potential of multimodal technologies based on mid-air gestures and voice input for making older adults' interaction with mobile devices more accessible and inclusive.

Natural User Interfaces

TLDR
In this project, a short history of user interfaces and then natural user interface will be discussed, a vision based NUIs and the main issues in this approach will be explained.

Human-Computer Interaction. Multimodal and Natural Interaction: Thematic Area, HCI 2020, Held as Part of the 22nd International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part II

TLDR
A usercentered methodology for determining natural gestures for a set of common computer actions and building a vocabulary of gestures for system designers to use when developing gesture-based NUIs are provided.

Learning to Interpret and Apply Multimodal Descriptions

  • Ting Han
  • Psychology, Computer Science
  • 2018
TLDR
This dissertation concerns the task of learning to interpret multimodal descriptions composed of verbal utterances and hand gestures/sketches, and apply corresponding interpretations to tasks such as image retrieval.
...

References

SHOWING 1-10 OF 23 REFERENCES

Ten myths of multimodal interaction

TLDR
Well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other.

Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions

TLDR
The emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner are summarized-including early and late fusion approaches, and the new hybrid symbolic-statistical approach.

Unification-based Multimodal Integration

TLDR
A multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition is described, which allows the component modalities to mutually compensate for each others' errors.

Multimodal interactive maps: designing for human performance

TLDR
In this research, interfaces supporting spoken, pen-based, and multimodal input were analyzed for their effectiveness in interacting with map systems and indicated that map displays can be structured to minimize performance errors and disfluencies effectively.

“Put-that-there”: Voice and gesture at the graphics interface

  • R. Bolt
  • Art, Computer Science
    SIGGRAPH '80
  • 1980
TLDR
The work described herein involves the user commanding simple shapes about a large-screen graphics display surface, and because voice can be augmented with simultaneous pointing, the free usage of pronouns becomes possible, with a corresponding gain in naturalness and economy of expression.

Mutual disambiguation of recognition errors in a multimodel architecture

TLDR
Although speech recognition as a stand-alone performed farmore poorly for accented speakers, their multimodal recognitionrates did not differ from those of native speakers, and implications are discussed for the development of future multi-modal architecture that can perform in a more robust and stable manner.

Multimodal Integration - A Statistical View

TLDR
This work develops two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process, and evaluates these methods using Quickset, a speech/gesture multimodals system, and reports evaluation results based on an empirical corpus collected with Quicksets.

Manual and gaze input cascaded (MAGIC) pointing

TLDR
This work explores a new direction in utilizing eye gaze for computer input by proposing an alternative approach, dubbed MAGIC (Manual And Gaze Input Cascaded) pointing, which might offer many advantages, including reduced physical effort and fatigue as compared to traditional manual pointing, greater accuracy and naturalness than traditional gaze pointing, and possibly fasterspeed than manual pointing.

The efficiency of multimodal interaction: a case study

TLDR
A case study comparison of a directmanipulation-based graphical user interface (GUI) with the QuickSet pen/voice multi-modal interface for supporting the task of military force “laydown” suggests that there may be substantial efficiency advantages to multimodal interaction over GUIs for map-based tasks.