Towards a More Reliable Interpretation of Machine Learning Outputs for Safety-Critical Systems using Feature Importance Fusion

  title={Towards a More Reliable Interpretation of Machine Learning Outputs for Safety-Critical Systems using Feature Importance Fusion},
  author={Divish Rengasamy and Benjamin Rothwell and Grazziela Patrocinio Figueredo},
When machine learning supports decision-making in safety-critical systems, it is important to verify and understand the reasons why a particular output is produced. Although feature importance calculation approaches assist in interpretation, there is a lack of consensus regarding how features’ importance is quantified, which makes the explanations offered for the outcomes mostly unreliable. A possible solution to address the lack of agreement is to combine the results from multiple feature… 

Mechanistic Interpretation of Machine Learning Inference: A Fuzzy Feature Importance Fusion Approach

Here it is shown how the use of fuzzy data fusion methods can overcome some of the important limitations of crisp fusion methods.

EFI: A Toolbox for Feature Importance Fusion and Interpretation in Python

. (cid:63) This paper presents an open-source Python toolbox called Ensemble Feature Importance (EFI) to provide machine learning (ML) researchers, domain experts, and decision makers with robust and

Reconnoitering the class distinguishing abilities of the features, to know them better

This work estimates the class-distinguishing capabilities (scores) of the variables for pair-wise class combinations and validate the explainability given by the scheme empirically on several real-world, multi-class datasets.

Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection

This work introduces a method for estimating the posterior distribution of the contamination factor of a given unlabeled dataset using a specific mixture formulation and shows that the estimated distribution is well-calibrated and that setting the threshold using the posterior mean improves the anomaly detectors’ performance over several alternative methods.

Machine learning to determine the main factors affecting creep rates in laser powder bed fusion

The applicability and potential of using ML to determine and predict the mechanical properties of materials fabricated via different manufacturing processes, and to find process–structure–property relationships in AM are shown.

Predicción de factores clave en el aumento de la demografía en Colombia a través del ensamble de modelos de Machine Learning

El envejecimiento de la población es considerado uno de los fenómenos sociales más significativos que está transformando las economías y las sociedades en todo el mundo. Según la Organización Mundial



Deep Learning Approaches to Aircraft Maintenance, Repair and Overhaul: A Review

A survey on deep learning architectures and their application in aircraft MRO identifies four main architectures employed to MRO, namely, Deep Autoencoders, Long Short-Term Memory, Convolutional Neural Networks and Deep Belief Networks.

Permutation importance: a corrected feature importance measure

A heuristic for normalizing feature importance measures that can correct the feature importance bias is introduced and PIMP was used to correct RF-based importance measures for two real-world case studies and improve model interpretability.

Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management

Experimental results show that dynamically-weighted loss functions helps the use of deep learning models to focus on those instances where larger learning errors occur in order to improve their performance, and helps achieve significant improvement for remaining useful life prediction and fault detection rate over non-weighting loss function predictions.

A Unified Approach to Interpreting Model Predictions

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Interpretability of deep learning models: A survey of results

  • Supriyo ChakrabortyRichard J. Tomsett Prudhvi K. Gurram
  • Computer Science
    2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
  • 2017
Some of the dimensions that are useful for model interpretability are outlined, and prior work along those dimensions are categorized, in the process of performing a gap analysis of what needs to be done to improve modelinterpretability.

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Impact of noise on credit risk prediction: Does data quality really matter?

  • B. Twala
  • Computer Science
    Intell. Data Anal.
  • 2013
It is shown that when noise is added to four real-world credit risk domains, a significant and disproportionate number of total errors are contributed by class noise compared to attribute noise; thus, in the presence of noise, it is noise on the class variable that are responsible for the poor predictive accuracy of the learning concept.

Random generalized linear model: a highly accurate and interpretable ensemble predictor

RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability).

From local explanations to global understanding with explainable AI for trees

An explanation method for trees is presented that enables the computation of optimal local explanations for individual predictions, and the authors demonstrate their method on three medical datasets.