Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models
@article{Park2021FacilitatingKS, title={Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models}, author={Soya Park and April Yi Wang and Ban Kawas and Qingzi Vera Liao and David Piorkowski and Marina Danilevsky}, journal={26th International Conference on Intelligent User Interfaces}, year={2021} }
Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to…
10 Citations
Bridging Multi-disciplinary Collaboration Challenges in ML Development via Domain Knowledge Elicitation
- Computer ScienceDASH
- 2021
Ziva is introduced, an interface for supporting domain knowledge from domain experts to data scientists in two ways: a concept creation interface where domain experts extract important concept of the domain and five kinds of justification elicitation interfaces that solicit elicitation how the domain concept are expressed in data instances.
“It’s Like the Value System in the Loop”: Domain Experts’ Values Expectations for NLP Automation
- Computer ScienceConference on Designing Interactive Systems
- 2022
The study findings provide groundwork for the inclusion of domain experts values whose expertise lies outside of the field of computing into the design of automated NLP systems.
Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process
- Computer Science2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
- 2022
This work identifies key collaboration challenges that teams face when building and deploying ML systems into production, and finds that most of these challenges center around communication, documentation, engineering, and process, and collects recommendations to address these challenges.
Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision Making
- Computer ScienceCHI
- 2022
A new system called Crystalline is introduced that automatically collects and organizes information into tabular structures as the user searches and browses the web, and uses passive behavioral signals to infer what information to collect and how to visualize and prioritize it.
More Engineering, No Silos: Rethinking Processes and Interfaces in Collaboration between Interdisciplinary Teams for Machine Learning Projects
- Computer ScienceArXiv
- 2021
Key collaboration challenges that teams face when building and deploying ML systems into production are identified and most of these challenges center around communication, documentation, engineering, and process and recommendations to address these challenges are collected.
Trade-offs in Sampling and Search for Early-stage Interactive Text Classification
- Computer ScienceIUI
- 2022
It is shown that supplementing early-stage sampling with user-guided text search can effectively “seed” a classifier with positive documents without compromising generalization performance—particularly for imbalanced tasks where positive documents are rare.
How AI Developers Overcome Communication Challenges in a Multidisciplinary Team
- Computer ScienceProc. ACM Hum. Comput. Interact.
- 2021
Using the analytic lens of shared mental models, this paper reports on the types of communication gaps that AI developers face, how AI developers communicate across disciplinary and organizational boundaries, and how they simultaneously manage issues regarding trust and expectations.
Empathosphere: Promoting Constructive Communication in Ad-hoc Virtual Teams through Perspective-taking Spaces
- BusinessProceedings of the ACM on Human-Computer Interaction
- 2022
Empathosphere is introduced, a chat-embedded intervention to mitigate social barriers and foster constructive communication in teams and demonstrates that “experimental spaces,” particularly those that integrate methods of encouraging perspective-taking, can be a powerful means of improving communication in virtual teams.
How Stimulating Is a Green Stimulus? The Economic Attributes of Green Fiscal Spending
- EconomicsAnnual Review of Environment and Resources
- 2022
When deep recessions hit, some governments spend to rescue and recover their economies. Key economic objectives of such countercyclical spending include protecting and creating jobs while…
How Domain Experts Work with Data: Situating Data Science in the Practices and Settings of Craftwork
- Computer ScienceProceedings of the ACM on Human-Computer Interaction
- 2022
Drawing on an ethnographic study of a craft brewery in Korea, it is shown how craft brewers worked with data by situating otherwise abstract data within their brewing practices and settings.
References
SHOWING 1-10 OF 94 REFERENCES
Explainable Active Learning (XAL): An Empirical Study of How Local Explanations Impact Annotator Experience
- Computer ScienceArXiv
- 2020
This study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models
- Computer ScienceCHI
- 2019
This investigation investigated why and how professional data scientists interpret models, and how interface affordances can support data scientists in answering questions about model interpretability, and showed that interpretability is not a monolithic concept.
How do Data Science Workers Collaborate? Roles, Workflows, and Tools
- Computer ScienceProc. ACM Hum. Comput. Interact.
- 2020
It is found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model).
Snorkel: Rapid Training Data Creation with Weak Supervision
- Computer ScienceProc. VLDB Endow.
- 2017
Snorkel is a first-of-its-kind system that enables users to train state- of- the-art models without hand labeling any training data and proposes an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution.
Machine Teaching by Domain Experts: Towards More Humane, Inclusive, and Intelligent Machine Learning Systems
- Computer ScienceArXiv
- 2019
This paper argues that a possible way to escape from the limitations of current machine learning (ML) systems is to allow their development directly by domain experts without the mediation of ML…
Neural Ranking Models with Weak Supervision
- Computer ScienceSIGIR
- 2017
This paper proposes to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources, and suggests that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.
The Emerging Role of Data Scientists on Software Development Teams
- Computer Science2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)
- 2016
Five distinct working styles of data scientists are identified: Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; Modeling Specialists, who use their machine learning expertise to build predictive models; Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; and Team Leaders, who run teams of data Scientists and spread best practices.
How Data Scientists Use Computational Notebooks for Real-Time Collaboration
- Computer ScienceProc. ACM Hum. Comput. Interact.
- 2019
How synchronous editing in computational notebooks changes the way data scientists work together compared to working on individual notebooks is reported and several design implications aimed at better supporting collaborative editing in synchronous notebooks are proposed, thus improving efficiency in teamwork among data scientists.
Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models
- Computer ScienceIEEE Transactions on Visualization and Computer Graphics
- 2019
Manifold is presented, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner and is designed as a generic framework.
Structured Labeling to Facilitate Concept Evolution in Machine Learning
- Computer Science
- 2014
The notion of concept evolution, the changing nature of a person’s underlying concept which can result in inconsistent labels and thus be detrimental to machine learning is introduced.