• Publications
  • Influence
An Application of Zipf's Law for Prose and Verse Corpora Neutrality for Hindi and Marathi Languages
TLDR
Common tokens from corpora of verses and proses of Marathi as well as Hindi are identified to prove that both of them behave same as per as NLP activities are concerened and the betterment of BaSa over Zipf’s law is proved.
Stanza Type Identification using Systematization of Versification System of Hindi Poetry
TLDR
The paper covers various challenges and the best possible solutions for those challenges, describing the methodology to generate automatic metadata for “Chhand” based on the poems’ stanzas, and provides some advanced information and techniques for metadata generation for ”Muktak Chhands”.
Effect of Header-based Features on Accuracy of Classifiers for Spam Email Classification
TLDR
This research intends to find out minimum number of features required to classify spam and ham emails and shows that in order to achieve the objective of satisfactory filtering, minimum 5 and maximum 14 features are required.
On Exhaustive Evaluation of Eager Machine Learning Algorithms for Classification of Hindi Verses
TLDR
Text classification algorithms along with Natural Language Processing (NLP) facilitates fast, cost-effective, and scalable solution for classification and prediction of verses on Hindi corpus.
Towards Natural Language Processing with Figures of Speech in Hindi Poetry
TLDR
This work is the first of its kind in Hindi Natural Language Processing (NLP), which touches on the area of Hindi figure of speech and has created a systematic hierarchical structure of Hindi “Alankaar” types and sub-types and attempted and extended the work to identify a few.
On State-of-the-art of POS Tagger, ‘Sandhi’ Splitter, ‘Alankaar’ Finder and ‘Samaas’ Finder for Indo-Aryan and Dravidian Languages
TLDR
Analysis shows that Rule Based Approach (RBA) and Hidden Markov Model (HMM) are frequently used for POS tagging, RBA is most frequently usedfor “Sandhi” Splitter, the general Human Intelligence (HI) is used for “Alankaar” Finder and no “Samaas” finder technique is available for any Indian language.
Marathi Document: Similarity Measurement using Semantics-based Dimension Reduction Technique
TLDR
The proposed approach designs the Document Term Matrix for Marathi (DTMM) corpus and converts unstructured data into a tabular format and forms synsets and in turn reduces dimensions to formulate a Document Synset Matrix forMarathi corpus.