Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists

@article{Drosos2020WrexAU,
  title={Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists},
  author={Ian Drosos and Titus Barik and Philip J. Guo and Robert DeLine and Sumit Gulwani},
  journal={Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems},
  year={2020}
}
Data wrangling is a difficult and time-consuming activity in computational notebooks, and existing wrangling tools do not fit the exploratory workflow for data scientists in these environments. We propose a unified interaction model based on programming-by-example that generates readable code for a variety of useful data transformations, implemented as a Jupyter notebook extension called Wrex. User study results demonstrate that data scientists are significantly more effective and efficient at… 

Tables from this paper

Glinda: Supporting Data Science with Live Programming, GUIs and a Domain-specific Language
TLDR
Glinda is introduced, which combines a live programming experience, with interactive results, for a domain-specific language for data science, and uses an open-ended set of “recipes” to execute steps in the user’s data science workflow.
Unravel: A Fluent Code Explorer for Data Wrangling
TLDR
A tool called Unravel that enables structural edits via drag-and-drop and toggle switch interactions to help data scientists explore and understand fluent code and facilitated diverse activities such as validating assumptions about the code or data, exploring alternatives, and revealing function behavior.
mage: Fluid Moves Between Code and Graphical Work in Computational Notebooks
TLDR
This work extends computational notebooks with a new API mage, which supports tools that can represent themselves as both code and GUI as needed, and implements six client tools for mage that illustrate the main themes of the study findings.
B2: Bridging Code and Interactive Visualization in Computational Notebooks
TLDR
B2, a set of techniques grounded in treating data queries as a shared representation between the code and interactive visualizations, is presented and found that B2 promotes a tighter feedback loop between coding and interacting with visualizations.
Interactive Program Synthesis by Augmented Examples
TLDR
An interaction model to disambiguate user intent and reduce the cognitive load of understanding and validating synthesized programs is presented and implemented in the domain of regular expressions, which is a popular mechanism for text processing and data wrangling.
Digging for fold: synthesis-aided API discovery for Haskell
TLDR
The study shows that programmers equipped with Hoogle+ generally solve tasks faster and were able to solve 50% more tasks overall and shows how to extend this elimination technique to automatically generate informative inputs that can be used to demonstrate program behavior to the user.
TweakIt: Supporting End-User Programmers Who Transmogrify Code
TLDR
A prototype tool that provides users with a familiar live interaction to help them understand, introspect, and reify how different code snippets would transform their data.
LooPy: interactive program synthesis with control structures
TLDR
LooPy is presented, a synthesizer integrated into a live programming environment, which extends Small-Step Live PBE to work inside loops and scales it up to synthesize larger code snippets, while remaining fast enough for interactive use.
Gauss: program synthesis by reasoning over graphs
TLDR
Gauss is presented, a synthesis algorithm for table transformations that accepts partial input-output examples, along with user intent graphs, and is able to reduce the search space by 56×, 73× and 664× on average, resulting in 7×, 26× and 7× speedups in synthesis times on average.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 31 REFERENCES
Wrangler: interactive visual specification of data transformation scripts
TLDR
Wrangler combines direct manipulation of visualized data with automatic inference of relevant transforms, enabling analysts to iteratively explore the space of applicable operations and preview their effects.
Foofah: Transforming Data By Example
TLDR
This paper develops a technique to synthesize data transformation programs by example, reducing this burden by allowing the analyst to describe the transformation with a small input-output example pair, without being concerned with the transformation steps required to get there.
Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel
TLDR
An extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources.
Proactive wrangling: mixed-initiative end-user programming of data transformation scripts
TLDR
A model to proactively suggest data transforms which map input data to a relational format expected by analysis tools is presented, and a metric that scores tables according to type homogeneity, sparsity and the presence of delimiters is proposed.
Research directions in data wrangling: Visualizations and transformations for usable and credible data
TLDR
It is argued that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization.
DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science
TLDR
DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage, and turns the entire web into a rich substrate for learning data science.
NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation
TLDR
The design and implementation of a robust natural language based interface to spreadsheet programming that supports a rich user interaction model including annotating the user's natural language specification and explaining the synthesized DSL programs by paraphrasing them into structured English is described.
Spreadsheet data manipulation using examples
TLDR
This work presents a programming by example methodology that allows end users to automate such repetitive tasks over large spreadsheet data by designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples.
Spreadsheet table transformations from examples
TLDR
An automatic technique that takes from a user an example of how the user needs to transform a table of data, and provides to the user a program that implements the transformation described by the example, and presents a language of programs TableProg that can describe transformations that real users require.
Synthesizing Number Transformations from Input-Output Examples
TLDR
A framework that can learn number transformations from very few input-output examples is presented, and an inductive synthesis algorithm for manipulating data types that have numbers as a constituent sub-type such as date, unit, and time is obtained.
...
1
2
3
4
...