Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study
@article{Wongsuphasawat2019GoalsPA, title={Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study}, author={Kanit Wongsuphasawat and Yang Liu and Jeffrey Heer}, journal={ArXiv}, year={2019}, volume={abs/1911.00568} }
How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data quality) and discovery (gaining new insights). Though the EDA literature primarily emphasizes discovery, we observe that discovery only reliably occurs in the context of open-ended analyses, whereas all participants engage in profiling across all of their analyses…
27 Citations
Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
- Computer ScienceACM Trans. Comput. Hum. Interact.
- 2022
A mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process referred to as hypothesis formalization is presented.
Initial Insights into Exploratory Process Mining Practices
- BusinessBPM
- 2021
. Process mining enables organizations to streamline and automate their business processes. The initial phases of process mining projects often include exploration activities aimed to familiarize…
Towards a Conceptual Model for Data Narratives
- Computer ScienceER
- 2020
A conceptual model of data narrative for exploratory data analysis based on four layers that reflect the transition from raw data to the visual rendering of the data story: factual, intentional, structural and presentational is proposed.
EDA and its Impact in Dataset Discover Patterns in the Service Sector
- Computer Science2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA)
- 2022
Data analysis with EDA is effective in service sector to discover any missing link and patterns before using it to any ML model and a flow-chart for exploratory data analysis process has been developed in this study to identify the required steps under this model.
A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis
- Computer Science2022 IEEE 7th International conference for Convergence in Technology (I2CT)
- 2022
QuickViz is developed, an interactive web application tool that simplifies and automates the exploratory data analysis stage and is extremely user-friendly and easy to use.
Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
- Computer ScienceArXiv
- 2022
This work conducts a quantitative study of Jupyter notebooks mined from GitHub and presents regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum.
DSWorkFlow: A Framework for Capturing Data Scientists’ Workflows
- Computer ScienceCHI Extended Abstracts
- 2021
DSWorkFlow is a data collection framework that provides researchers with the ability to observe and analyze data scientists’ cognitive workflows as they develop predictive models and test three machine learning models to inform the extraction algorithms.
Case Study Comparison of Computational Notebook Platforms for Interactive Visual Analytics
- Computer Science2022 IEEE Visualization in Data Science (VDS)
- 2022
This work investigated the problem using an example called “Andromeda,” which is an interactive dimension reduction algorithm, and implemented it using three different notebook platforms: 1) Python code in a Jupyter Notebooks, 2) JavaScript code in an Observable Notebook, and 3) embedding both Python ( data science use) and JavaScript (visual analytics use) in a TSP.
Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers
- Computer ScienceIEEE Transactions on Visualization and Computer Graphics
- 2021
A retrospective analysis of data science work and workers as described within the data visualization, human computer interaction, and data science literature is conducted to synthesis a comprehensive model that describes dataScience work and breakdown to data scientists into nine distinct roles.
Data Curation: Towards a Tool for All
- Computer ScienceHCI
- 2020
A new data science platform, termed DS4All, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation, by combining HCI concepts.
References
SHOWING 1-10 OF 74 REFERENCES
Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices
- PsychologyIEEE Transactions on Visualization and Computer Graphics
- 2019
We report the results of interviewing thirty professional data analysts working in a range of industrial, academic, and regulatory environments. This study focuses on participants' descriptions of…
Enterprise Data Analysis and Visualization: An Interview Study
- BusinessIEEE Transactions on Visualization and Computer Graphics
- 2012
This work characterize the process of industrial data analysis and document how organizational features of an enterprise impact it, and describes recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools.
Exploring the analytical processes of intelligence analysts
- PsychologyCHI
- 2009
This study investigates and analyzes the analytical processes of intelligence analysts through a combination of scenario-based analysis, artifact analysis, role-playing, interviews, and participant observations, and explores the space and boundaries in which intelligence analysts work and operate.
Visualizing Dimension Coverage to Support Exploratory Analysis
- Computer ScienceIEEE Transactions on Visualization and Computer Graphics
- 2017
Results of the empirical study showed that participants with access to embedded dimension coverage information relied on this information when formulating questions, asked more questions about the data, generated more top-level findings, and showed greater breadth of their analysis without sacrificing depth.
Exploratory Data Analysis
- Education
- 2012
The philosophical justification for EDA is presented in terms of C.S. Pierce's concept of abduction and the recognition of a broad range of analytic needs that arise throughout the research process.
The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool
- Computer ScienceCHI
- 2018
Design guidance is given for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.
Principles and procedures of exploratory data analysis.
- Sociology
- 1997
The central heuristics and computational tools of EDA are introduced and it is shown how these tools complement the use of significance and hypothesis tests used in confirmatory data analysis (CDA).
Exploration and Explanation in Computational Notebooks
- PsychologyCHI
- 2018
Three studies of how academic data analysts are using notebooks to document and share exploratory data analyses demonstrate a tension between exploration and explanation in constructing and sharing computational notebooks.
Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations
- Computer ScienceIEEE Transactions on Visualization and Computer Graphics
- 2016
It is found that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage, and the need to balance rapid exploration and targeted question-answering for visualization tools is distill.
Exploratory analysis of spatial and temporal data - a systematic approach
- Computer Science
- 2005
The authors describe in detail and systemize approaches, techniques, and methods for exploring spatial and temporal data in particular, developing a general view of data structures and characteristics and building on top of this a general task typology.