ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

  title={ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users},
  author={Dhruv Jain and Khoa Nguyen and Steven M. Goodman and Rachel Grossman-Kahn and Hung Ngo and Aditya Kusupati and Ruofei Du and Alex Olwal and Leah Findlater and Jon E. Froehlich},
  journal={Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH… 

Figures and Tables from this paper

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

LipLearner leverages contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort, and exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset.

HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones

HiSSNet is an SEID (SED and SID) model that uses a hierarchical prototypical network to detect both general and specific sounds of interest and characterize both alarm-like and speech sounds.

“Easier or Harder, Depending on Who the Hearing Person Is”: Codesigning Videoconferencing Tools for Small Groups with Mixed Hearing Status

It is found that established groups crafted social accessibility norms that met their relational contexts, and promising directions for future captioning design are identified, including the need to standardize speaker identification and customization, opportunities to provide behavioral feedback during a conversation, and ways that videoconferencing platforms could enable groups to set and share norms.

SoundVizVR: Sound Indicators for Accessible Sounds in Virtual Reality for Deaf or Hard-of-Hearing Users

Sounds provide vital information such as spatial and interaction cues in virtual reality (VR) applications to convey more immersive experiences to VR users. However, it may be a challenge for deaf or

CB-Conformer: Contextual biasing Conformer for biased word recognition

  • Yaoxun XuBaiji Liu H. Meng
  • Computer Science
    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2023
CB-Conformer is proposed to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer, resulting in a greater focus on biased words and more successful integration with the automatic speech recognition model than the standard fixed language model.

Exploring the Use of the SoundVizVR Plugin with Game Developers in the Development of Sound-Accessible Virtual Reality Games

This study worked with two VR game developers to explore the usability of the SoundVizVR Plugin and how it impacted the view on accessibility from the game developers’ perspectives.

Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing

To better understand how DHH users can drive personalization of their own assistive sound recognition tools, a three-part study with 14 DHH participants highlights a positive subjective experience when recording and interpreting training data in situ, but uncovers several key pitfalls unique to D HH users.

SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users

A performance evaluation of four low-resource deep learning sound classification models: MobileNet, Inception, ResNet-lite, and VGG-lite across four device architectures: watch-only, watch+phone, watch-+phone+cloud, and watch+cloud finds that the watch+phones architecture provided the best balance between CPU, memory, network usage, and classification latency.

A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users

A mobile phone app that alerts deaf and hard-of-hearing people to sounds they care about is designed, and the viability of a basic machine learning algorithm for sound detection is explored.

SoundSense: scalable sound sensing for people-centric applications on mobile phones

This paper proposes SoundSense, a scalable framework for modeling sound events on mobile phones that represents the first general purpose sound sensing system specifically designed to work on resource limited phones and demonstrates that SoundSense is capable of recognizing meaningful sound events that occur in users' everyday lives.

Scribe4Me: Evaluating a Mobile Sound Transcription Tool for the Deaf

A 2-week field study of an exploratory prototype of a mobile sound transcription tool for the deaf and hard-of-hearing shows that the approach is feasible, highlights particular contexts in which it is useful, and provides information about what should be contained in transcriptions.

Deaf and Hard-of-hearing Individuals' Preferences for Wearable and Mobile Sound Awareness Technologies

Findings related to sound type, full captions vs. keywords, sound filtering, notification styles, and social context provide direct guidance for the design of future mobile and wearable sound awareness systems.

HomeSound: An Iterative Field Deployment of an In-Home Sound Awareness System for Deaf or Hard of Hearing Users

HomeSound, an in-home sound awareness system for Deaf and hard of hearing (DHH) users, consists of a microphone and display, and uses multiple devices installed in each home, similar to the Echo Show or Nest Hub.

Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition

This work built an end-to-end system for self-supervised learning of events labelled through one-shot interaction, and shows that the system can accurately and automatically learn acoustic events across environments, while adhering to users' preferences for non-intrusive interactive behavior.

Environmental Sound Recognition With Time–Frequency Audio Features

An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.

Evaluating non-speech sound visualizations for the deaf

An iterative investigation of peripheral, visual displays of ambient sounds, providing valuable information about the sound awareness needs of the deaf and can help to inform further design of such applications.