Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

  title={Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making},
  author={Kori M. Inkpen and Shreya Chappidi and Keri Mallari and Besmira Nushi and Divya Ramesh and Pietro Michelucci and Vani Mandava and Libuvse Hannah Vepvrek and Gabrielle Quinn},
1 ABSTRACT Human-AI collaboration for decision-making strives to achieve team performance that exceeds the performance of humans or AI alone. However, many factors can impact success of Human-AI teams, including a user’s domain expertise, mental models of an AI system, trust in recommendations, and more. This paper reports on a study that examines users’ interactions with three simulated algorithmic models, all with equivalent accuracy rates but each tuned differently in terms of true positive… 



Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

It is shown that updates that increase AI performance may actually hurt team performance, and a re-training objective is proposed to improve the compatibility of an update by penalizing new errors.

Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making

It is shown that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors.

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance

This work conducts mixed-method user studies on three datasets, where an AI with accuracy comparable to humans helps participants solve a task (explaining itself in some conditions), and observes complementary improvements from AI augmentation that were not increased by explanations.

Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance

This work highlights two key properties of an AI’s error boundary, parsimony and stochasticity, and a property of the task, dimensionality, and shows experimentally how these properties affect humans’ mental models of AI capabilities and the resulting team performance.

Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making

A time allocation strategy for a resource-constrained setting that achieves optimal human-AI collaboration under some assumptions and can effectively de-anchor the human and improve collaborative performance when the AI model has low confidence and is incorrect is designed.

I can do better than your AI: expertise and explanations

An analysis of cognitive metrics lead to three findings for research in intelligent assistants: higher reported familiarity with the task simultaneously predicted more reported trust but less adherence, and showing explanations to people who reported more task familiarity led to automation bias.

Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks

It is found that the level of agreement between people and a model on decision-making tasks that people have high confidence in significantly affects reliance on the model if people receive no information about the model’s performance, but this impact will change after aggregate-level model performance information becomes available.

Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making

This paper presents a comparison on the effects of a set of established XAI methods in AI-assisted decision making, and highlights three desirable properties that ideal AI explanations should satisfy—improve people’s understanding of the AI model, help people recognize the model uncertainty, and support people's calibrated trust in the model.

Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems

This work conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and using subjective measures of trust and preference as predictors of actual performance.

Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems

This work uses a Scheduling Assistant - an AI system for automated meeting request detection in free-text email - to study the impact of several methods of expectation setting and designs expectation adjustment techniques that prepare users for AI imperfections and result in a significant increase in acceptance.