• Corpus ID: 207798031

Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study

  title={Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study},
  author={Kanit Wongsuphasawat and Yang Liu and Jeffrey Heer},
How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data quality) and discovery (gaining new insights). Though the EDA literature primarily emphasizes discovery, we observe that discovery only reliably occurs in the context of open-ended analyses, whereas all participants engage in profiling across all of their analyses… 

Figures from this paper

Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications

A mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process referred to as hypothesis formalization is presented.

Initial Insights into Exploratory Process Mining Practices

. Process mining enables organizations to streamline and automate their business processes. The initial phases of process mining projects often include exploration activities aimed to familiarize

Towards a Conceptual Model for Data Narratives

A conceptual model of data narrative for exploratory data analysis based on four layers that reflect the transition from raw data to the visual rendering of the data story: factual, intentional, structural and presentational is proposed.

EDA and its Impact in Dataset Discover Patterns in the Service Sector

Data analysis with EDA is effective in service sector to discover any missing link and patterns before using it to any ML model and a flow-chart for exploratory data analysis process has been developed in this study to identify the required steps under this model.

A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis

  • Harshee Pitroda
  • Computer Science
    2022 IEEE 7th International conference for Convergence in Technology (I2CT)
  • 2022
QuickViz is developed, an interactive web application tool that simplifies and automates the exploratory data analysis stage and is extremely user-friendly and easy to use.

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time

This work conducts a quantitative study of Jupyter notebooks mined from GitHub and presents regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum.

DSWorkFlow: A Framework for Capturing Data Scientists’ Workflows

DSWorkFlow is a data collection framework that provides researchers with the ability to observe and analyze data scientists’ cognitive workflows as they develop predictive models and test three machine learning models to inform the extraction algorithms.

Case Study Comparison of Computational Notebook Platforms for Interactive Visual Analytics

  • Han LiuC. North
  • Computer Science
    2022 IEEE Visualization in Data Science (VDS)
  • 2022
This work investigated the problem using an example called “Andromeda,” which is an interactive dimension reduction algorithm, and implemented it using three different notebook platforms: 1) Python code in a Jupyter Notebooks, 2) JavaScript code in an Observable Notebook, and 3) embedding both Python ( data science use) and JavaScript (visual analytics use) in a TSP.

Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers

A retrospective analysis of data science work and workers as described within the data visualization, human computer interaction, and data science literature is conducted to synthesis a comprehensive model that describes dataScience work and breakdown to data scientists into nine distinct roles.

Data Curation: Towards a Tool for All

A new data science platform, termed DS4All, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation, by combining HCI concepts.



Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices

We report the results of interviewing thirty professional data analysts working in a range of industrial, academic, and regulatory environments. This study focuses on participants' descriptions of

Enterprise Data Analysis and Visualization: An Interview Study

This work characterize the process of industrial data analysis and document how organizational features of an enterprise impact it, and describes recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools.

Exploring the analytical processes of intelligence analysts

This study investigates and analyzes the analytical processes of intelligence analysts through a combination of scenario-based analysis, artifact analysis, role-playing, interviews, and participant observations, and explores the space and boundaries in which intelligence analysts work and operate.

Visualizing Dimension Coverage to Support Exploratory Analysis

Results of the empirical study showed that participants with access to embedded dimension coverage information relied on this information when formulating questions, asked more questions about the data, generated more top-level findings, and showed greater breadth of their analysis without sacrificing depth.

Exploratory Data Analysis

The philosophical justification for EDA is presented in terms of C.S. Pierce's concept of abduction and the recognition of a broad range of analytic needs that arise throughout the research process.

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

Design guidance is given for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.

Principles and procedures of exploratory data analysis.

The central heuristics and computational tools of EDA are introduced and it is shown how these tools complement the use of significance and hypothesis tests used in confirmatory data analysis (CDA).

Exploration and Explanation in Computational Notebooks

Three studies of how academic data analysts are using notebooks to document and share exploratory data analyses demonstrate a tension between exploration and explanation in constructing and sharing computational notebooks.

Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations

It is found that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage, and the need to balance rapid exploration and targeted question-answering for visualization tools is distill.

Exploratory analysis of spatial and temporal data - a systematic approach

The authors describe in detail and systemize approaches, techniques, and methods for exploring spatial and temporal data in particular, developing a general view of data structures and characteristics and building on top of this a general task typology.