"Play PRBLMS": Identifying and Correcting Less Accessible Content in Voice Interfaces

@article{Springer2018PlayPI,
  title={"Play PRBLMS": Identifying and Correcting Less Accessible Content in Voice Interfaces},
  author={Aaron Springer and Henriette Cramer},
  journal={Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems},
  year={2018}
}
Voice interfaces often struggle with specific types of named content. Domain-specific terminology and naming may push the bounds of standard language, especially in domains like music where artistic creativity extends beyond the music itself. Artists may name themselves with symbols (e.g. M S C RA) that most standard automatic speech recognition (ASR) systems cannot transcribe. Voice interfaces also experience difficulty surfacing content whose titles include non-standard spellings, symbols or… Expand
Play Music: User Motivations and Expectations for Non-Specific Voice Queries
TLDR
This work study an example of ambiguous requests, such as “play music,” where users ask to stream content using a single utterance that does not specify what content they want to hear, and observes four themes that structure user perceptions of the benefits and shortcomings of making NSQs. Expand
Exploring Interactions with Voice-Controlled TV
TLDR
Research through design methods is used to explore an early prototype movie recommendation system where the only input modality is voice, mitigating the drawbacks of voice-only interactions, navigating the tension between expressiveness and efficiency, and building voice-driven recommendation interfaces that facilitate exploration. Expand
Differences between smart speakers and graphical user interfaces for music search considering gender effects
TLDR
The analysis of how users naturally interact with smart speakers revealed that the VUI provides significantly lower usability because it lacks features, requires higher mental effort, and provides confusing answers. Expand
"All Rise for the AI Director": Eliciting Possible Futures of Voice Technology through Story Completion
TLDR
Through a thematic analysis, these stories reveal the extremes of the capabilities and concerns of today's voice assistants and artificial intelligence, such as improving efficiency and offering instantaneous support, but also replacing human jobs, eroding human agency, and causing harm through malfunction. Expand
Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction
TLDR
This work argues that current VUIs do not adequately consider the diversity of peoples' speech, how that diversity represents sociocultural identities, and how voices have the potential to shape user perceptions and experiences and poses three design strategies for VUI voice output design - individualisation, context awareness, and diversification. Expand
One Voice Fits All?
TLDR
A simple research framework is introduced for understanding how voice affects how the authors perceive and interact with smart devices, and how voice design depends on a complex interplay between characteristics of the user, device, and context. Expand
One Voice Fits All? Social Implications and Research Challenges of Designing Voices for Smart Devices
ing with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions fromExpand
Vitro: Designing a Voice Assistant for the Scientific Lab Workplace
TLDR
This paper investigates whether voice assistants can play a useful role in the specialized work-life of the knowledge worker (in a biology lab) and contributes implications for the design of voice-enabled systems in workplace settings. Expand
Firefox Voice: An Open and Extensible Voice Assistant Built Upon the Web
TLDR
Firefox Voice is introduced, a novel voice assistant built on the open web ecosystem with an aim to expand access to information available via voice, and how Firefox Voice enables the development of novel, open web-powered voice-driven experiences is described. Expand
"I had a solid theory before but it's falling apart": Polarizing Effects of Algorithmic Transparency
TLDR
The effects of transparency on user perceptions of a working intelligent system for emotion detection are explored and the notion of transparency is revisit and design considerations for building safe and successful machine learning systems are suggested. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 55 REFERENCES
“Don’t Touch My Moustache”: Language Blending and Code Ambiguation by Two J-Pop Artists
Abstract “Code Ambiguation” is a form of language blending similar to code mixing or code switching, but, unlike these other kinds of blending, it produces an utterance that has potential meaning inExpand
Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates
TLDR
It is proposed that doubly confusable pairs, rather than high neighborhood densit y, may better explain phonetic neighborhood errors in human speech processing. Expand
A Multimodal Crowdsourcing Framework for Transcribing Historical Handwritten Documents
TLDR
The experiments explore how an initial handwritten text recognition hypothesis can be improved by using the contribution of speech recognition from several speakers, providing as a final result a better hypothesis to be amended by a professional transcriber with less effort. Expand
Effect of pronounciations on OOV queries in spoken term detection
TLDR
This paper investigates the inclusion of n-best pronunciation variants for OOV terms (obtained from letter-to-sound rules) into the search and presents the results obtained by indexing confusion networks as well as lattices. Expand
A spoken term detection framework for recovering out-of-vocabulary words using the web
TLDR
This work proposes a novel approach to OOV recovery that uses a spoken term detection (STD) framework and recovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error. Expand
Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter'
TLDR
Two MIR-based hands-free jukebox systems that enable a user to retrieve and play back a musical piece by saying its title or the artist’s name are described, and the effectiveness of the speech-completion and speech-spotter interfaces are demonstrated. Expand
Multi-reference WER for evaluating ASR for languages with no orthographic rules
TLDR
This work proposes an innovative approach for evaluating speech recognition using Multi-References, which is in favor of accepting a recognized word if any of the references typed it in the same form. Expand
Improving recognition of proper nouns in ASR through generating and filtering phonetic transcriptions
TLDR
This work proposes a method that allows the extraction of phonetic transcriptions of proper nouns using actual utterances of those Proper nouns, thus yielding transcriptions based on practical use instead of mere pronunciation rules. Expand
Searching by Talking: Analysis of Voice Queries on Mobile Web Search
TLDR
This paper examines the logs of a commercial search engine's mobile interface, and compares the spoken queries to the typed-in queries, placing special emphasis on the semantic and syntactic characteristics of the two types of queries. Expand
Recognizing words across regional accents: the role of perceptual assimilation in lexical competition
TLDR
It is concluded that perceptual assimilation plays a key role in cross-accent word recognition; lexical competition involves not only onsets but also later aspects of words; vowel and consonant variations affectLexical competition similarly. Expand
...
1
2
3
4
5
...