• Corpus ID: 212471491

A Study of Current State of Work done for Classification in Indian Languages

  title={A Study of Current State of Work done for Classification in Indian Languages},
  author={Kaushika Pal and Biraj V. Patel},
  journal={International Journal of Scientific Research in Science and Technology},
  • Kaushika Pal, B. Patel
  • Published 31 October 2017
  • Linguistics, Computer Science
  • International Journal of Scientific Research in Science and Technology
Classification has become an important aspect of study for storing, organizing and retrieving relevant document. So much work has been done in English language. Researchers have now started focusing on Indian language document classification as lot of content is available on web in Indian languages. The purpose of this paper is to study current work done in various Indian languages, and analyze the current situation and future scope to research in classification and related work on Indian… 
Multi - Class Document Classification: Effective and Systematized Method to Categorize Documents
This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents that is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.
Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques
  • Kaushika Pal, B. Patel
  • Computer Science
    2020 International Conference for Emerging Technology (INCET)
  • 2020
Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.
Emotion Classification with Reduced Feature Set SGDClassifier, Random Forest and Performance Tuning
This research work is classifying emotions written in Hindi in form of poem with 4 categories namely Karuna, Shanta, Shringar and Veera, the model is build with Random Forest, SGDClassifier and was trained with 134 poetries and tested with 46 Poetries for both types of features.
Computing Science, Communication and Security: First International Conference, COMS2 2020, Gujarat, India, March 26–27, 2020, Revised Selected Papers
This paper covers the approaches which have shown valuable results for contrast objects captured from the plane, like cars, ships, and many others, instead of the polar bears that look blurry on the ice, to build a tool that increases the semi-automatic bear detection rate a dozen times.


A Study of Text Classification Natural Language Processing Algorithms for Indian Languages
This study shows that supervised learning algorithms (Naive Bayes (NB), Support Vector Machine (SVM), Artificial Neural Network (ANN), and N-gram) performed better for Text Classification task.
A Survey on Text Categorization Techniques for Indian Regional Languages
A survey of text categorization techniques for Indian regional languages and keywords-Text categorization, Clustering, Naïve Bayes, KNearest Neighbor, Support Vector Machine, Hybrid Approach are presented.
This paper made an attempt to show the need of text mining for Indian languages by using techniques from information retrieval, information extraction as well as natural language processing (NLP) and connects them with the algorithms and methods of KDD, data mining, machine learning and statistics.
Domain Based Classification of Punjabi Text Documents using Ontology and Hybrid Based Approach
The experimental results conclude that Ontology Based Classification and Hybrid Approach provide better results in comparison to standard classification algorithms, Centroid Based Classification (71%) and Naive Bayes Classification (64%).
Survey Paper on Document Classification and Classifiers
With the increasing availability of digital documents from diverse sources, text classification is gaining popularity day in and day out and this is done with the amalgamation of NLP(Natural Language Processing), Data Mining and Machine learning techniques.
Performance analysis of flexible zone based features to classify Hindi numerals
The performance of fixed boundary and flexible boundary is evaluated and performance for SVM is better than SVM for recognition of the digits and MLP based classifier is used.
Multiclass classification and class based sentiment analysis for Hindi language
A model for classification of Hindi speech documents into multiple classes with the help of ontology is proposed and sentiment analysis is carried out using HindiSentiWordNet (HSWN) to determine the polarity of individual class.
Comparison study of various sentiment classification techniques is given and two main categories of sentiment classification technique these are machine based and lexicon based are discussed.
Algorithm for Punjabi Text Classification
Preprocessing techniques, features selection methods for PunJabi and classification algorithm to classify the Punjabi Text documents are introduced.
Identification of relations from IndoWordNet for Indian languages using Support Vector Machine
Support Vector Machine (SVM) based approach for learning, classifying and automatically predicting relationships between Hindi Synsets and the system performance has been validated using the performance measures namely Precision, Recall and F-score.