Multi - Class Document Classification: Effective and Systematized Method to Categorize Documents

  title={Multi - Class Document Classification: Effective and Systematized
Method to Categorize Documents},
  author={Kaushika Pal and Biraj V. Patel},
  journal={International journal of scientific research in science, engineering and technology},
  • Kaushika Pal, B. Patel
  • Published 14 February 2020
  • Computer Science
  • International journal of scientific research in science, engineering and technology
A large section of World Wide Web is full of Documents, content; Data, Big data, unformatted data, formatted data, unstructured and unorganized data and we need information infrastructure, which is useful and easily accessible as an when required. This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents. Natural Language Processing is used which will divide the problem of understanding entire document at once into… 
3 Citations

Figures and Tables from this paper

Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques
  • Kaushika Pal, B. Patel
  • Computer Science
    2020 International Conference for Emerging Technology (INCET)
  • 2020
Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.
SMU Data Science Review SMU Web Page Multiclass Classification Web Page Multiclass Classification
A variation in the approach to text preprocessing pipeline whereby noun phrase extraction is performed first followed by lemmatization, contraction expansion, removing special characters, removing extra white space, lower casing, and removal of stop words is proposed.
A Survey: Accretion in Linguistic Classification of Indian Languages
  • Dip Patel
  • Linguistics
    Data Engineering for Smart Systems
  • 2021


An Efficient Hindi Text Classification Model Using SVM
A Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies aSet of Hindi unknown documents.
An efficient technique for hybrid classification and feature extraction using normalization
PCA is improved by applying normalization-using size of features in this proposed approach, which reduces the redundant features to larger extent and accuracy is improved using the proposed approach as compared to existing approaches.
Survey Paper on Document Classification and Classifiers
With the increasing availability of digital documents from diverse sources, text classification is gaining popularity day in and day out and this is done with the amalgamation of NLP(Natural Language Processing), Data Mining and Machine learning techniques.
Hindi Text Document Classification System Using SVM and Fuzzy: A Survey
A new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic first pre-processes and then classifies textual imaged documents into predefined categories.
Machine Learning Algorithms for Opinion Mining and Sentiment Classification
Opinion Mining or Sentiment Analysis is a Natural Language Processing and Information Extraction task that identifies the user's views or opinions explained in the form of positive, negative or neutral comments and quotes underlying the text.
Punjabi Poetry Classification: The Test of 10 Machine Learning Algorithms
Results for Punjabi poetry classification revealed that 4 machine learning algorithms namely, Hyperpipes (HP), K- nearest neighbour (KNN), Naive Bayes (NB) and Support Vector Machine (SVM) with an accuracy of 50.63 %, 52.75 % and 58.79 % respectively, outperformed all other machinelearning algorithms under the test.
Research paper classification systems based on TF-IDF and LDA schemes
A research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects is proposed.
Children story classification based on structure of the story
  • M. HarikrishnaD., K. S. Rao
  • Computer Science
    2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
  • 2015
The main part of the stories has the highest classification accuracy compared to introduction and climax parts of the story, and a framework for story classification using keyword and Part-of-speech (POS) based features is proposed.
A Study of Current State of Work done for Classification in Indian Languages
The purpose of this paper is to study current work done in various Indian languages, and analyze the current situation and future scope to research in classification and related work on Indian languages.
Model for Classification of Poems in Hindi Language Based on Ras
The developed model will classify poem into Shringar, Hasya, Adbhuta, Shanta, Raudra, Veera, Karuna, Bhayanaka, Vibhasta rasas, which will use mix of part-of-speech-based feature and emotional