Designing punjabi poetry classifiers using machine learning and different textual features

  title={Designing punjabi poetry classifiers using machine learning and different textual features},
  author={Jasleen Kaur and Jatinderkumar R. Saini},
  journal={Int. Arab J. Inf. Technol.},
Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and… 
Hindi Poetry Classification using Eager Supervised Machine Learning Algorithms
Two eager machine learning algorithms are applied on the corpus containing 450 Hindi poems and poetry/poem gets classified based on terms present in it using a misclassification error.
Stanza Type Identification using Systematization of Versification System of Hindi Poetry
The paper covers various challenges and the best possible solutions for those challenges, describing the methodology to generate automatic metadata for “Chhand” based on the poems’ stanzas, and provides some advanced information and techniques for metadata generation for ”Muktak Chhands”.
Towards Natural Language Processing with Figures of Speech in Hindi Poetry
This work is the first of its kind in Hindi Natural Language Processing (NLP), which touches on the area of Hindi figure of speech and has created a systematic hierarchical structure of Hindi “Alankaar” types and sub-types and attempted and extended the work to identify a few.
On Exhaustive Evaluation of Eager Machine Learning Algorithms for Classification of Hindi Verses
Text classification algorithms along with Natural Language Processing (NLP) facilitates fast, cost-effective, and scalable solution for classification and prediction of verses on Hindi corpus.
Sensed-Lexicon based Approach for Identification of Similarity among Punjabi Documents
Results revealed that on the basis of majority voting, combination of stop word removal with stemming and ‘noun’ based synonym replacement leads to the best combination with bi-gram tokens.
Analysing the Poetic Structure of Jana-Gaṇa-Mana in Entirety: A Statistical Approach
Measurable investigation of abstract content so as to bring bits of knowledge into its expressive highlights has been a shared zone of enthusiasm among the aficionados of writing and measurements.
Measuring the Similarity between the Sanskrit Documents using the Context of the Corpus
The proposed approach processes the oldest, untouched, one of the morphologically critical languages, Sanskrit and builds a document term matrix for Sanskrit (DTMS) and Document synset matrix Sanskrit (DSMS) to solve the problem of polysemy.
Marathi Document: Similarity Measurement using Semantics-based Dimension Reduction Technique
The proposed approach designs the Document Term Matrix for Marathi (DTMM) corpus and converts unstructured data into a tabular format and forms synsets and in turn reduces dimensions to formulate a Document Synset Matrix forMarathi corpus.
Hindi Verse Class Predictor using Concept Learning Algorithms
In this paper, 565 Hindi poems are classified based on four topics using lazy machine-learning algorithms which are K-nearest neighbours and regression, and K nearset neighbours performs better than Linear regression.


Punjabi Poetry Classification: The Test of 10 Machine Learning Algorithms
Results for Punjabi poetry classification revealed that 4 machine learning algorithms namely, Hyperpipes (HP), K- nearest neighbour (KNN), Naive Bayes (NB) and Support Vector Machine (SVM) with an accuracy of 50.63 %, 52.75 % and 58.79 % respectively, outperformed all other machinelearning algorithms under the test.
Automatic Punjabi poetry classification using machine learning algorithms with reduced feature set
This work Classification of poems is very challenging in computational linguistic point of view and Naive Bayes outperformed all other classifiers utilising 60% top ranked features and hyperpipes is the least efficient classifier.
Automated Analysis of Bangla Poetry for Classification and Poet Identification
This work makes use of semantic (word) features to perform subject-based classification of Bangla poems, and various stylistic as well as semantic features for poet identification, and uses a Multiclass SVM classifier to classify Tagore’s collection of poetry into four categories.
Poetry Classification Using Support Vector Machines
The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.
Automatic Categorization of Ottoman Literary Texts by Poet and Time Period
Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications
A Natural Language Processing Approach for Identification of Stop Words in Punjabi Language
This paper concentrates on identification of stop words from poetry and other news articles and discusses the importance of each sub-phase in Punjabi poetry.
Emotion Classification in Arabic Poetry using Machine Learning
This work attempts to remedy the situation by considering the problem of classifying documents by their overall sentiment into four affect categories that are present in Arabic poetry Retha, Ghazal, Fakhr and Heja.
Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle
For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications.
Automatic Classification of Literature Pieces by Emotion Detection: A Study on Quevedo's Poetry
An experiment on the categorization of poems based on their emotional content, which is automatically measured, to verify whether the information about emotional content can be used to build classifiers reproducing that categorization.
Automatic meter classification in Persian poetries using support vector machines
The proposed meter classification system for Persian poems based on features extracted from uttered poem shows 91% accuracy in three top meter style choices and is robust against syllables insertion, deletion or classification.