Managing Messes in Computational Notebooks

@article{Head2019ManagingMI,
  title={Managing Messes in Computational Notebooks},
  author={Andrew Head and Fred Hohman and Titus Barik and Steven Mark Drucker and Robert DeLine},
  journal={Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems},
  year={2019}
}
  • Andrew Head, Fred Hohman, R. DeLine
  • Published 2 May 2019
  • Computer Science
  • Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
Data analysts use computational notebooks to write code for analyzing and visualizing data. Notebooks help analysts iteratively write analysis code by letting them interleave code with output, and selectively execute cells. However, as analysis progresses, analysts leave behind old code and outputs, and overwrite important code, producing cluttered and inconsistent notebooks. This paper introduces code gathering tools, extensions to computational notebooks that help analysts find, clean… 

Figures from this paper

Code Duplication and Reuse in Jupyter Notebooks
TLDR
How much, how and from where code duplication occurs in computational notebooks, and how much reuse occurs or which barriers they face when reusing code are explored.
Jupyter’s Archive: Searchable Output Histories for Computational Notebooks
TLDR
The Output Archive is introduced, a thumbnailbased output history built into Jupyter Lab that automatically records all outputs produced over the lifetime of a notebook and makes the code that produced them available, and a new class of grouping filters which allows users to navigate large output histories by clustering outputs based on similarities in their underlying code.
ToonNote: Improving Communication in Computational Notebooks Using Interactive Data Comics
TLDR
ToonNote, a JupyterLab extension that enables the conversion of notebooks into “data comics,” is introduced, and how the findings inform the future design of interfaces for computational notebooks and features to support diverse collaborators is discussed.
Fork It: Supporting Stateful Alternatives in Computational Notebooks
TLDR
This work introduces forking — creating a new interpreter session — and backtracking — navigating through previous states of a notebook to help data scientists more directly express and navigate through decision points in a single notebook.
Casual Notebooks and Rigid Scripts: Understanding Data Science Programming
TLDR
A tension between scripts and computational notebooks is shown, which leads to several issues that affect data workers’ workflows, and implications for the design of programming IDEs are discussed.
StickyLand: Breaking the Linear Presentation of Computational Notebooks
TLDR
StickyLand is a notebook extension for empowering users to freely organize their code in non-linear ways, with sticky cells that are always shown on the screen, so users can quickly access their notes, instantly observe experiment results, and easily build interactive dashboards that support complex visual analytics.
TRACTUS: Understanding and Supporting Source Code Experimentation in Hypothesis-Driven Data Science
TLDR
TRACTUS is a system extending the popular RStudio IDE, that detects, tracks, and visualizes code experiments in hypothesis-driven data science tasks, and helps recall decisions and insights by grouping code experiments into hypotheses, and structuring information like code execution output and documentation.
TweakIt: Supporting End-User Programmers Who Transmogrify Code
TLDR
A prototype tool that provides users with a familiar live interaction to help them understand, introspect, and reify how different code snippets would transform their data.
B2: Bridging Code and Interactive Visualization in Computational Notebooks
TLDR
B2, a set of techniques grounded in treating data queries as a shared representation between the code and interactive visualizations, is presented and found that B2 promotes a tighter feedback loop between coding and interacting with visualizations.
Albireo: An Interactive Tool for Visually Summarizing Computational Notebook Structure
TLDR
Albireo is designed, implemented, and evaluated, with the goal of supporting more effective exploration and communication by displaying the dependencies and relationships between the cells of a notebook using a dynamic graph structure.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Variolite: Supporting Exploratory Programming by Data Scientists
TLDR
The needs for improving version control tools for exploratory tasks are explored, and a tool for lightweight local versioning, called Variolite, is demonstrated, which programmers found usable and desirable in a preliminary usability study.
Design and Use of Computational Notebooks
TLDR
This dissertation demonstrates that tracking and sharing of complex analyses is hindered by a tension between exploration and explanation, but that computational notebooks and other media can reduce this tension by supporting not only the combination of, but also flexible organization and navigation of analytical steps, explanatory text, and computed results.
Exploration and Explanation in Computational Notebooks
TLDR
Three studies of how academic data analysts are using notebooks to document and share exploratory data analyses demonstrate a tension between exploration and explanation in constructing and sharing computational notebooks.
Interactions for Untangling Messy History in a Computational Notebook
  • Mary Beth Kery, B. Myers
  • Computer Science
    2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)
  • 2018
Experimentation through code is central to data scientists' work. Prior work has identified the need for interaction techniques for quickly exploring multiple versions of the code and the associated
The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool
TLDR
Design guidance is given for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.
Interactive Extraction of Examples from Existing Code
TLDR
A mixed-initiative tool to help programmers extract executable, simplified code from existing code, CodeScoop enables programmers to "scoop" out a relevant subset of code.
BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure
Researchers in fields such as bioinformatics, CS, finance, and applied math have trouble managing the numerous code and data files generated by their computational experiments, comparing the results
Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding
TLDR
Through a lab study and multi-week deployment, cell folding aids notebook navigation and comprehension, not only by the original author, but also by collaborators viewing the notebook in a meeting or revising it on their own.
Rehearse : Helping Programmers Adapt Examples by Visualizing Execution and Highlighting Related Code
TLDR
It is proposed that effective use of examples hinges on the programmer's ability to quickly identify a small number of relevant lines interleaved among a larger body of boilerplate code.
Tempura: Temporal Dimension for IDEs
TLDR
This paper proposes a novel approach of adding a temporal dimension to IDEs, enabling code completion and navigation to operate on multiple revisions of code at a time, and implements and evaluates a prototype tool called Tempura.
...
1
2
3
4
5
...