VizWiz: nearly real-time answers to visual questions

@inproceedings{Bigham2010VizWizNR,
  title={VizWiz: nearly real-time answers to visual questions},
  author={Jeffrey P. Bigham and Chandrika Jayant and Hanjie Ji and Greg Little and Andrew Miller and Rob Miller and Aubrey Tatarowicz and Brandyn Allen White and Samuel White and Tom Yeh},
  booktitle={W4A},
  year={2010}
}
Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to interpret such visual information is a nuisance for blind people who often have effective, if inefficient, work-arounds to overcome them. Collectively, however, they can make blind people less independent. Specialized technology addresses some problems… 

Figures from this paper

Using real-time feedback to improve visual question answering
TLDR
Legion:View is introduced, a system that enables users to interact with the crowd via a real-time feedback loop for visual questions between users and crowd workers.
Answering visual questions with conversational crowd assistants
TLDR
This paper introduces Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device and demonstrates the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy.
Getting fast, free, and anonymous answers to questions asked by people with visual impairments
TLDR
The long-term public deployment and lessons learned from VizWiz Social, a human-powered access tool that connects people with visual impairments to sighted workers or friends and family members who can answer their visual questions, are explored.
Vocally Specified Text Recognition in Natural Scenes for the Blind and Visually Impaired
TLDR
This research is investigating how accurate could a developed application be in finding vocally specified text from natural scenes, and speech and text recognition modules can be combined to validate acquired data and measure the accuracy of locating Vocally specified items.
In-context Q&A to Support Blind People Using Smartphones
TLDR
Hint Me!, a human-powered service that allows blind users to get in-app assistance by posing questions or browsing previously answered questions on a shared knowledge-base is proposed, revealing the benefits of a hybrid approach.
FootNotes
TLDR
FootNotes is presented, a system that embeds rich textual descriptions of objects and locations in OpenStreetMap, a popular geowiki, that helps people thoroughly explore a new location or serendipitously discover previously unknown features of familiar environments.
Facilitating independence for photo taking and browsing for blind persons
TLDR
This dissertation research aims to facilitate independence for blind persons to locate and browse photos in a sequential manner, as opposed to global, through user-centered development of a smartphone application that can be used without sight.
Using social microvolunteering to answer visual questions from blind users
TLDR
V VizWiz, a smartphone application that connects blind people with visual questions to sighted workers who can provide answers, is developed, and social microvolunteering is proposed, which allows people to donate their own time as answerers, but also access to a larger pool of answerers through their friends on social networking sites.
Vision Skills Needed to Answer Visual Questions
TLDR
This work identifies the common vision skills needed for object recognition, text recognition, color recognition, and counting on over 27,000 visual questions from two datasets representing both scenarios and proposes a novel task of predicting what vision skills are needed to answer a question about an image.
Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing
TLDR
It is shown that the shortcomings of existing AI image captioning systems frequently hinder a user's understanding of an image they cannot see to a degree that even clarifying conversations with sighted assistants cannot correct.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 14 REFERENCES
Photo-based question answering
TLDR
This work develops a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching and leverages community experts to handle the most difficult cases.
Labeling images with a computer game
TLDR
A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Introduction to the talking points project
TLDR
The Talking Points project aims to create a system for attaching information to places and objects, in a format that can be easily converted to speech, that allows blind and visually impaired users to get information about the places they are walking by, and can also be used to provide digital information to other passersby.
Scribe4Me: Evaluating a Mobile Sound Transcription Tool for the Deaf
TLDR
A 2-week field study of an exploratory prototype of a mobile sound transcription tool for the deaf and hard-of-hearing shows that the approach is feasible, highlights particular contexts in which it is useful, and provides information about what should be contained in transcriptions.
A camera phone based currency reader for the visually impaired
  • X. Liu
  • Computer Science
    Assets '08
  • 2008
TLDR
A camera phone-based currency reader for the visually impaired that can identify the value of U.S. paper currency by developing efficient background subtraction and perspective correction algorithms and trained the currency reader using an efficient Ada-boost framework.
Crowdsourcing user studies with Mechanical Turk
TLDR
Although micro-task markets have great potential for rapidly collecting user measurements at low costs, it is found that special care is needed in formulating tasks in order to harness the capabilities of the approach.
Slide rule: making mobile touch screens accessible to blind people using multi-touch interaction techniques
TLDR
Slide Rule is introduced, a set of audio-based multi-touch interaction techniques that enable blind users to access touch screen applications and shows that Slide Rule was significantly faster than the button-based system, and was preferred by 7 of 10 users.
Freedom to roam: a study of mobile device adoption and accessibility for people with visual and motor disabilities
TLDR
This formative study examines how people with visual and motor disabilities select, adapt, and use mobile devices in their daily lives, and provides guidelines for more accessible and empowering mobile device design.
Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria
TLDR
An empirical study is conducted to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.
What do people ask their social networks, and why?: a survey study of status message q&a behavior
TLDR
This paper explores the phenomenon of using social network status messages to ask questions, and presents detailed data on the frequency of this type of question asking, the types of questions asked, and respondents' motivations for asking their social networks rather than using more traditional search tools like Web search engines.
...
1
2
...