Help, It Looks Confusing: GUI Task Automation Through Demonstration and Follow-up Questions

  title={Help, It Looks Confusing: GUI Task Automation Through Demonstration and Follow-up Questions},
  author={Thanapong Intharah and Daniyar Turmukhambetov and Gabriel J. Brostow},
  journal={Proceedings of the 22nd International Conference on Intelligent User Interfaces},
Non-programming users should be able to create their own customized scripts to perform computer-based tasks for them, just by demonstrating to the machine how it's done. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up… 


A user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications, and uses quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by the system.

RecurBot: Learn to Auto-complete GUI Tasks From Human Demonstrations

This work proposes a method that learns from a few user-performed demonstrations, and then predicts and finally performs the remaining actions in the task, and validate the approach on a new database of GUI tasks, and shows that it usually gleans what it needs from short user demonstrations and autocompletes tasks in diverse GUI situations.

GUI Interaction on Autopilot

This work proposes Autopilot, a context-aware system that detects and automates repetitive behavior by leveraging the programming by demonstration (PBD) paradigm, and aims to enrich user experience by lessening the tedium of performing repetitive tasks.

APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions

The evaluation showed that APPINITE is easy-to-use and effective in creating scripts for tasks that would otherwise be difficult to create with prior PBD systems, due to ambiguous data descriptions in demonstrations on GUIs.

User-in-the-loop adaptive intent detection for instructable digital assistant

A user-in-the-loop adaptive intent detection framework that allows the assistant to adapt to its user by learning his intents as their interaction progresses, and which addresses two major issues - intent learning and user adaptation for instructable digital assistants.

A Multi-Modal Intelligent Agent that Learns from Demonstrations and Natural Language Instructions

The preliminary lab usability evaluation results showed that the prototype of SUGILITE allowed users with little or no programming expertise to successfully teach the agent common smartphone tasks, as well as the appropriate conditionals for triggering these actions and the relevant concepts for determining these conditions.

VASTA: a vision and language-assisted smartphone task automation system

An initial user study is run that demonstrates the effectiveness of VASTA at clustering user utterances, understanding changes in the automation parameters, detecting desired UI elements, and, most importantly, automating various tasks.

Help through demonstration and automation for interactive computing systems: A survey of recent works

This paper presents a general survey on recent works related to improving applications’ help through demonstration and automation and, identifies which technologies are acting as enablers.

Describing UI Screenshots in Natural Language

XUI is introduced, a novel method inspired by the global precedence effect to create informative descriptions of UIs, starting with an overview and then providing fine-grained descriptions about the most salient elements, which found are highly readable, perceived to accurately describe the UI, and score similarly to human-generated UI descriptions.

Towards human-guided machine learning

This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user's knowledge about the data available.



Sheepdog: learning procedures for technical support

Sheepdog is presented, an implemented system for capturing, learning, and playing back technical support procedures on the Windows desktop using Input/Output Hidden Markov Models and the results of a user study that examines how users follow printed directions.

Programming by Examples - and its applications in Data Wrangling

  • Sumit Gulwani
  • Computer Science
    Dependable Software Systems Engineering
  • 2016
The notion of Ispec is formalized and some principles behind designing useful DSLs for synthesis are discussed and some user interaction models including program navigation and active-learning based conversational clarification that communicate actionable information to the user to help resolve ambiguity in the Ispec are presented.

GUI testing using computer vision

This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks and shows how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

Creating contextual help for GUIs using screenshots

A creation tool for contextual help that allows users to apply common computer skills-taking screenshots and writing simple scripts and performs pixel analysis on screenshots to make this tool applicable to a wide range of applications and platforms without source code access.

Sikuli: using GUI screenshots for search and automation

Sikuli allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element's name, and provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events.

Pause-and-play: automatically linking screencast video tutorials with applications

Pause-and-Play is presented, a system that helps users work along with existing video tutorials by using computer vision to detect events in existing videos and leverages application scripting APIs to obtain real time usage traces.

Generating photo manipulation tutorials by demonstration

A demonstration-based system for automatically generating succinct step-by-step visual tutorials of photo manipulations that leverages automated image labeling to generate more precise text descriptions of many of the steps in the tutorials.

Sheepdog, parallel collaborative programming-by-demonstration

Watch what I do: programming by demonstration

Part 1 Systems: Pygmalion tinker a predictive calculator rehearsal world smallStar peridot metamouse TELS eager garnet the Turvy experience chimera the geometer's sketchpad tourmaline a history-based

EverTutor: automatically creating interactive guided tutorials on smartphones by user demonstration

Study results show that creating tutorials by EverTutor is simpler and faster than producing static and video tutorials, and the task completion time for interactive tutorials were 3-6 times faster than static andVideo tutorials regardless of age group.