The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment

  title={The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment},
  author={Davide Chicco and Matthijs J. Warrens and Giuseppe Jurman},
  journal={IEEE Access},
Even if measuring the outcome of binary classifications is a pivotal task in machine learning and statistics, no consensus has been reached yet about which statistical rate to employ to this end. In the last century, the computer science and statistics communities have introduced several scores summing up the correctness of the predictions with respect to the ground truth values. Among these scores, the Matthews correlation coefficient (MCC) was shown to have several advantages over confusion… 

Figures and Tables from this paper

The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation
The results demonstrate that the coefficient of determination (R-squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE.
Formal definition of the MARS method for quantifying the unique target class discoveries of selected machine classifiers
This paper presents the methodology for MARS (Method for Assessing Relative Sensitivity/ Specificity) ShineThrough and MARS Occlusion scores, two novel binary classification performance metrics, designed to quantify the distinctiveness of a classifier’s predictive successes and failures, relative to alternative classifiers.
An Invitation to Greater Use of Matthews Correlation Coefficient in Robotics and Artificial Intelligence
Scientists defined statistical rates that summarize TP, FP, FN, and TN in one value, which is the harmonic mean of positive predictive value and true positive rate.
Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study
This work presents a comparative study among different learning strategies that leverage both feature selection, to cope with high dimensionality, as well as cost-sensitive learning methods, to coping with class imbalance, on datasets that are high-dimensional and class-imbalanced.
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI
This poster presents a meta-modelling system that automates the very labor-intensive and therefore time-consuming and expensive and therefore expensive and time-heavy and expensive process of cataloguing and annotating the text for this poster.
Dissecting the genre of Nigerian music with machine learning models
Cross validation for model selection: a primer with examples from ecology
It is concluded that CV-based model selection should be widely applied to ecological analyses, because of its robust estimation properties and the broad range of situations for which it is applicable.
Statistical classification for Raman spectra of tumoral genomic DNA
We exploit Surface–Enhanced Raman Scattering (SERS) to investigate aqueous droplets of genomic DNA deposited onto silver–coated silicon nanowires and we show that it is possible to efficiently
In silico prediction of mosquito repellents for clothing application
ABSTRACT Use of protective clothing is a simple and efficient way to reduce the contacts with mosquitoes and consequently the probability of transmission of diseases spread by them. This mechanical


The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
This article shows how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario.
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
This manuscript reaffirms that MCC is a robust metric that summarizes the classifier performance in a single value, and compares it to other metrics which value positive and negative cases equally: balanced accuracy, bookmaker informedness, and markedness.
The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment
This study compares the MCC with the diagnostic odds ratio (DOR), a statistical rate employed sometimes in biomedical sciences, and describes the relationships between them, taking advantage of an innovative geometrical plot called confusion tetrahedron, presented here for the first time.
Why Cohen’s Kappa should be avoided as performance measure in classification
It is found that when there is a decrease to zero of the entropy of the elements out of the diagonal of the confusion matrix associated to a classifier, the discrepancy between Kappa and MCC rise, pointing to an anomalous performance of the former.
Cohen's kappa coefficient as a performance measure for feature selection
Using the kappa measure as an evaluation measure in a feature selection wrapper approach leads to more accurate classifiers, and therefore it leads to feature subset solutions with more relevant features.
About the relationship between ROC curves and Cohen's kappa
The MCC-F1 curve: a performance evaluation technique for binary classification
The MCC-F1 curve is proposed, which combines two informative single-threshold metrics, MCC and the F1 score, and provides a single value that integrates many aspects of classifier performance across the whole range of classification thresholds.
On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset
A unified view of performance metrics: translating threshold choice into expected classification loss
This analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses.
Interobserver agreement: Cohen's kappa coefficient does not necessarily reflect the percentage of patients with congruent classifications.
It is demonstrated that Cohen's kappa coefficient of agreement between 2 raters or 2 diagnostic methods based on binary (yes/no) responses does not parallel the percentage of patients with congruent classifications, and it may be of limited value in the assessment of increases in the interrater reliability due to an improved diagnostic method.