# Metrics for Multi-Class Classification: an Overview

@article{Grandini2020MetricsFM, title={Metrics for Multi-Class Classification: an Overview}, author={Margherita Grandini and Enrico Bagli and Giorgio Visani}, journal={ArXiv}, year={2020}, volume={abs/2008.05756} }

Classification tasks in machine learning involving more than two classes are known by the name of "multi-class classification". Performance indicators are very useful when the aim is to evaluate and compare different classification models or machine learning techniques. Many metrics come in handy to test the ability of a multi-class classifier. Those metrics turn out to be useful at different stage of the development process, e.g. comparing the performance of two different models or analysing…

## Figures from this paper

## 173 Citations

### Preference-Driven Classification Measure

- Computer ScienceEntropy
- 2022

This paper aims to propose a new measure of classifier quality assessment, called the preference-driven measure, abbreviated p-d, regardless of the number of classes, with the possibility of establishing the relative importance of each class.

### Analysis of multi-class classification performance metrics for remote sensing imagery imbalanced datasets

- Computer Science, Environmental ScienceJournal of Quantitative and Statistical Analysis
- 2021

This work presented a study of a set of performance evaluation metrics for an imbalance dataset and concluded that the Matthews correlation coefficient (MCC) presents the lowest bias in imbalanced cases and is regarded to be robust metric.

### Online Social Network Post Classification: A Multiclass approach

- Computer ScienceLecture Notes in Networks and Systems
- 2021

This work’s theoretical and practical significance is determined by the fact that the resulting model will partially automate the process of assessing the severity of users’ psychological characteristics on their text posts in social networks, as well as create the potential to refine the estimates of the protection of users from social engineering attacks, and the development of recommendation systems offering measures to improve users' protection.

### FOLD-RM: A Scalable and Efficient Inductive Learning Algorithm for Multi-Category Classification of Mixed Data

- Computer ScienceArXiv
- 2022

The FOLD-RM algorithm is competitive in performance with the widely-used X GBoost algorithm, however, unlike XGBoost, the FOLD the algorithm produces an explainable model and provides human-friendly explanations for predictions.

### Techniques to Deal with Off-Diagonal Elements in Confusion Matrices

- MathematicsMathematics
- 2021

Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. From a quality perspective, it is of…

### A Randomized Bag-of-Birds Approach to Study Robustness of Automated Audio Based Bird Species Classification

- Computer Science, Environmental ScienceApplied Sciences
- 2021

This contribution uses an artificial neural network fed with pre-computed sound features to study the robustness of bird sound classification and investigates in detail if and how classification results are dependent on the number of species and the selection of species in the subsets presented to the classifier.

### FOLD-RM: A Scalable, Efficient, and Explainable Inductive Learning Algorithm for Multi-Category Classification of Mixed Data

- Computer ScienceTheory and Practice of Logic Programming
- 2022

The FOLD-RM algorithm is competitive in performance with the widely used, state-of-the-art algorithms such as XGBoost and multi-layer perceptrons, however, unlike these algorithms, the FOLD -RM algorithm produces an explainable model.

### Imbalanced classification with tpg genetic programming: impact of problem imbalance and selection mechanisms

- Computer ScienceGECCO Companion
- 2022

This paper explores the effect of imbalanced data on the performance of a TPG classifier, and proposes mitigation methods for imbalance-caused classifier performance degradation using adapted GP selection phases.

### Classification of Cotton Leaf Diseases Using AlexNet and Machine Learning Models

- Computer ScienceCurrent Journal of Applied Science and Technology
- 2021

An Alex net model was implemented to identify and classify cotton leaf diseases and three fully connected layers of Alex Net provided the best performance model with a 94.92% F1_score at the training time of about 51min.

### T5W: A Paraphrasing Approach to Oversampling for Imbalanced Text Classification

- Computer Science2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)
- 2022

This paper proposes an oversampling technique that uses a combination of the T5 Transformer and the WordNet corpus to balance the dataset by paraphrasing text in minority classes.

## References

SHOWING 1-10 OF 12 REFERENCES

### Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine Learning

- Computer ScienceComputational Intelligence in Digital Forensics
- 2014

This chapter performs an experimental study on a forensics data task for multi-class classification including several types of methods such as decision trees, bayes classifiers, based on rules, artificial neural networks and based on nearest neighbors.

### The Balanced Accuracy and Its Posterior Distribution

- Mathematics2010 20th International Conference on Pattern Recognition
- 2010

It is shown that both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy.

### Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric

- Computer SciencePloS one
- 2017

The proposed MCC-classifier has a close performance to SVM-imba while being simpler and more efficient and an optimal Bayes classifier for the MCC metric using an approach based on Frechet derivative.

### Macro F1 and Macro F1

- Computer ScienceArXiv
- 2019

It is shown that only under rare circumstances, the two computations can be considered equivalent, and that one formula well 'rewards' classifiers which produce a skewed error type distribution.

### Comparing two K-category assignments by a K-category correlation coefficient

- MathematicsComput. Biol. Chem.
- 2004

### Classifiers and their Metrics Quantified

- BiologyMolecular informatics
- 2018

This work systematically considers metric value surface generation as a consequence of data balance, and proposes the computation of an inverse cumulative distribution function taken over a metric surface.

### Common pitfalls in statistical analysis: Measures of agreement

- PsychologyPerspectives in clinical research
- 2017

This article looks at statistical measures of agreement for different types of data and discusses the differences between these and those for assessing correlation.

### Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

- ChemistryBiochimica et biophysica acta
- 1975

### The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

- Computer ScienceBMC Genomics
- 2020

This article shows how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario.