Self-explaining AI as an Alternative to Interpretable AI

@inproceedings{Elton2020SelfexplainingAA,
  title={Self-explaining AI as an Alternative to Interpretable AI},
  author={Daniel C. Elton},
  booktitle={AGI},
  year={2020}
}
  • D. Elton
  • Published in AGI 12 February 2020
  • Computer Science, Mathematics
The ability to explain decisions made by AI systems is highly sought after, especially in domains where human lives are at stake such as medicine or autonomous vehicles. While it is often possible to approximate the input-output relations of deep neural networks with a few human-understandable rules, the discovery of the double descent phenomena suggests that such approximations do not accurately capture the mechanism by which deep neural networks work. Double descent indicates that deep neural… Expand
Teaching the Machine to Explain Itself using Domain Knowledge
TLDR
JOEL is a neural network-based framework to jointly learn a decision-making task and associated explanations that convey domain knowledge that very much resemble the experts' own reasoning, tailored to human-in-the-loop domain experts that lack deep technical ML knowledge. Expand
Applying Deutsch’s concept of good explanations to artificial intelligence and neuroscience – An initial exploration
  • D. Elton
  • Computer Science
  • Cognitive Systems Research
  • 2021
TLDR
This work investigates Deutsch's hard-to-vary principle and how it relates to more formalized principles in deep learning such as the bias-variance trade-off and Occam's razor and makes contact with the framework of Popperian epistemology which rejects induction and asserts that knowledge generation is an evolutionary process which proceeds through conjecture and refutation. Expand
The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Adversaries
TLDR
The Achilles Heel hypothesis is presented which states that highly-effective goal-oriented systems -- even ones that are potentially superintelligent -- may nonetheless have stable decision theoretic delusions which cause them to make obviously irrational decisions in adversarial settings. Expand
An analysis of gamma ray data collected at traffic intersections in Northern Virginia
TLDR
The analysis approach used here is described and the results in terms of radioisotope classes and frequency patterns over day-of-week and time- of-day spans are discussed. Expand
Enhancing data pipelines for forecasting student performance: integrating feature selection with cross-validation
Educators seek to harness knowledge from educational corpora to improve student performance outcomes. Although prior studies have compared the efficacy of data mining methods (DMMs) in pipelines forExpand
Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach
TLDR
This work worked with one of such models motivated by explicitly representing the classifier function as a linear function and showed that by exploiting probabilistic latent and properly modifying different parts of the model can result better explanation as well as provide superior predictive performance. Expand
Keep CALM and Improve Visual Feature Attribution
TLDR
This paper improves CAM by explicitly incorporating a latent variable encoding the location of the cue for recognition in the formulation, thereby subsuming the attribution map into the training computational graph. Expand
Towards Self-Explainable Adaptive Systems (SEAS): A Requirements Driven Approach
TLDR
The planned approach by investigating the relationship between Explainable Artificial intelligence (XAI) and runtime requirements engineering (RE) to realize SEAS which is capable to adapt and produce scenario-based positive and negative user explanations at runtime. Expand
Towards an Equitable Digital Society: Artificial Intelligence (AI) and Corporate Digital Responsibility (CDR)
TLDR
The paper seeks to harmonise and align approaches, illustrating the opportunities and threats of AI, while raising awareness of Corporate Digital Responsibility (CDR) as a potential collaborative mechanism to demystify governance complexity and to establish an equitable digital society. Expand
...
1
2
...

References

SHOWING 1-10 OF 83 REFERENCES
Interpretable Explanations of Black Boxes by Meaningful Perturbation
  • Ruth C. Fong, A. Vedaldi
  • Computer Science, Mathematics
  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
TLDR
A general framework for learning different kinds of explanations for any black box algorithm is proposed and the framework to find the part of an image most responsible for a classifier decision is specialised. Expand
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction. Expand
Towards Robust Interpretability with Self-Explaining Neural Networks
TLDR
This work designs self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models, and proposes three desiderata for explanations in general – explicitness, faithfulness, and stability. Expand
Synthesizing the preferred inputs for neurons in neural networks via deep generator networks
TLDR
This work dramatically improves the qualitative state of the art of activation maximization by harnessing a powerful, learned prior: a deep generator network (DGN), which generates qualitatively state-of-the-art synthetic images that look almost real. Expand
Distilling a Neural Network Into a Soft Decision Tree
TLDR
A way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data is described. Expand
Definitions, methods, and applications in interpretable machine learning
TLDR
This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy. Expand
Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks
TLDR
This work presents a new family of direct-fit models that use local computations to interpolate over task-relevant manifolds in a high-dimensional parameter space, providing a versatile, robust solution for learning a diverse set of functions. Expand
A Unified Approach to Interpreting Model Predictions
TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches. Expand
Explaining nonlinear classification decisions with deep Taylor decomposition
TLDR
A novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements by backpropagating the explanations from the output to the input layer is introduced. Expand
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
TLDR
This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks. Expand
...
1
2
3
4
5
...