Text Classification Algorithms: A Survey

  title={Text Classification Algorithms: A Survey},
  author={Kamran Kowsari and K. Meimandi and Mojtaba Heidarysafa and Sanjana Mendu and Laura E. Barnes and Donald E. Brown},
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. [] Key Method This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.

Comparative Study of Long Document Classification

It is re-iterated that long document classification is a simpler task and even basic algorithms perform competitively with BERT-based approaches on most of the datasets.

A Comparative Study of Parametric Versus Non-Parametric Text Classification Algorithms

  • Mihaela Chistol
  • Computer Science
    2020 International Conference on Development and Application Systems (DAS)
  • 2020
An overview of the text mining process is provided, a comparison of the performance and limitations of two predictive models generated using the parametric Naïve Bayes algorithm and nonparametric Deep Learning neural network are presented and RapidMiner data science software platform has been used for models’ implementations and e-mail classification.

The Problems and Methods of Automatic Text Document Classification

  • V. Yatsko
  • Computer Science
    Automatic Documentation and Mathematical Linguistics
  • 2021
The author describes the procedures of texts undersampling and logarithmic alignment and algorithms for computing the cosine similarity measure and the Z-score in a comprehensible form.


Different techniques for this textual data analysis like preprocessing, natural language processing, sentiment analysis, and classification are discussed here for proper selection according to the decision making.

An exploration on text classification using machine learning techniques

Traditional machine learning models, such as Support Vector Machines, Naïve Bayes and Random Forests are scrutinized for their performance on text classification for real-world corpora.

A Comparison of Supervised Text Classification and Resampling Techniques for User Feedback in Bahasa Indonesia

This paper aims to implement several numerical representations and implementing resampling techniques (to handling imbalanced data), which are followed by evaluating some popular supervised machine learning classification algorithms, which are the Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes, and Decision Tree.

Independent Component Analysis Based on Natural Gradient Algorithm for Text Mining

  • Hafedh ShabatN. Abbas
  • Computer Science
    2020 1st. Information Technology To Enhance e-learning and Other Application (IT-ELA
  • 2020
The natural gradient independent component analysis (NGICA) of text classification (TC), which is an application of TM involving the preprocessing of text data followed by classification, is explained and the potential of the proposed model is shown.

Classification of Multi-Labeled Text Articles with Reuters Dataset using SVM

The quality of the SVM classification algorithm with both linear and polynomial kernel on benchmark UCI News datasets: Reuters and Reuters shows that SVM with linear performs better and achieves 94.10% accuracy where SVMWithPolynomial Kernel achieves 93.28% accuracy on the benchmark dataset.

A Survey of Word Embedding Algorithms for Textual Data Information Extraction

  • Eugen VušakV. KuzinaA. Jović
  • Computer Science
    2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)
  • 2021
The currently available word embedding algorithms are described and it is shown what kind of information these algorithms use, and how it can be advantageous to use combinations of different types of information in different research and application areas.

A New Method of Automatic Text Document Classification

  • V. Yatsko
  • Computer Science
    Automatic Documentation and Mathematical Linguistics
  • 2021
The author has developed discriminative and similarative powers indicators that underlie the generalized efficiency score that proved high efficiency of the proposed method for the solution of the tasks of authorship attribution of texts of fiction and clusterization of political texts.



Machine learning in automated text categorization

This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.


This paper has tried to give the introduction ofText classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance.

HDLTex: Hierarchical Deep Learning for Text Classification

Hierarchical Deep Learning for Text classification employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.

Machine Learning for Text

This textbook covers machine learning topics for text in detail and targets graduate students in computer science, as well as researchers, professors, and industrialpractitioners working in these related fields.

An improved K-nearest-neighbor algorithm for text categorization

Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm

  • B. J. SowmyaChetanK. Srinivasa
  • Computer Science
    2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)
  • 2016
This work implements and compares two different algorithms based on text categorization: Rocchio algorithm and kNN and aims to better understand the approach to take in classifying hierarchical data.

Fast and accurate text classification via multiple linear discriminant projections

SIMPL is presented, a nearly linear-time classification algorithm that mimics the strengths of SVMs while avoiding the training bottleneck and not only approaches and sometimes exceeds SVM accuracy, but also beats the running time of a popular SVM implementation by orders of magnitude.

Imbalanced text classification: A term weighting approach

Some Effective Techniques for Naive Bayes Text Classification

This paper proposes two empirical heuristics: per-document text normalization and feature weighting method, which performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

A Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique and two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality.