An Application of Zipf's Law for Prose and Verse Corpora Neutrality for Hindi and Marathi Languages
@article{Bafna2020AnAO, title={An Application of Zipf's Law for Prose and Verse Corpora Neutrality for Hindi and Marathi Languages}, author={Prafulla Bharat Bafna and Jatinderkumar R.}, journal={International Journal of Advanced Computer Science and Applications}, year={2020}, volume={11} }
Availability of the text in different languages has become possible, as almost all websites have offered multilingual option. Hindi is considered as official language in one of the states of India. Hindi text analysis is dominated by the corpus of stories and poems. Before performing any text analysis token extraction is an important step and supports many applications like text summarization , categorizing text and so on. Token extraction is a part of Natural language processing (NLP). NLP…
7 Citations
Towards Natural Language Processing with Figures of Speech in Hindi Poetry
- Linguistics
- 2021
This work is the first of its kind in Hindi Natural Language Processing (NLP), which touches on the area of Hindi figure of speech and has created a systematic hierarchical structure of Hindi “Alankaar” types and sub-types and attempted and extended the work to identify a few.
Measuring the Similarity between the Sanskrit Documents using the Context of the Corpus
- Computer Science
- 2020
The proposed approach processes the oldest, untouched, one of the morphologically critical languages, Sanskrit and builds a document term matrix for Sanskrit (DTMS) and Document synset matrix Sanskrit (DSMS) to solve the problem of polysemy.
Stanza Type Identification using Systematization of Versification System of Hindi Poetry
- Computer Science
- 2021
The paper covers various challenges and the best possible solutions for those challenges, describing the methodology to generate automatic metadata for “Chhand” based on the poems’ stanzas, and provides some advanced information and techniques for metadata generation for ”Muktak Chhands”.
Marathi Document: Similarity Measurement using Semantics-based Dimension Reduction Technique
- Computer Science
- 2020
The proposed approach designs the Document Term Matrix for Marathi (DTMM) corpus and converts unstructured data into a tabular format and forms synsets and in turn reduces dimensions to formulate a Document Synset Matrix forMarathi corpus.
Hindi Poetry Classification using Eager Supervised Machine Learning Algorithms
- Computer Science2020 International Conference on Emerging Smart Computing and Informatics (ESCI)
- 2020
Two eager machine learning algorithms are applied on the corpus containing 450 Hindi poems and poetry/poem gets classified based on terms present in it using a misclassification error.
GUJARATI POETRY CLASSIFICATION BASED ON EMOTIONS USING DEEP LEARNING
- Computer ScienceInternational Journal of Engineering Applied Sciences and Technology
- 2021
Study presents a novel perspective in sentiment capture as of Gujarati Poems with the use of variety of characteristic there within Gujarati poems to disclose emotions through Gujarati poetries.
Toward a least-effort principle for evaluating prices of elements as indicators of sustainability
- EconomicsMRS Energy & Sustainability
- 2021
In this article, we use rank to understand the price of chemical elements. We observe that the role of the volume from global mining production dominates in materials economics. In this article, we…
References
SHOWING 1-10 OF 24 REFERENCES
Hindi Multi-document Word Cloud based Summarization through Unsupervised Learning
- Computer Science2019 9th International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing (ICETET-SIP-19)
- 2019
The objective is to manage the documents and summarize Hindi corpus by applying extracting tokens and document clustering, an application of TF-IDF, cosine-based document similarity measures and cluster dendrograms, in addition to various other Natural Language Processing (NLP) activities.
Novel Language Resources for Hindi: An Aesthetics Text Corpus and a Comprehensive Stop Lemma List
- LinguisticsInternational Journal of Advanced Computer Science and Applications
- 2020
This research lays emphasis on the use of stop lemmas instead of stop words owing to the presence of various, but not all morphological forms of a word in stop word lists, as opposed to the Presence of only the root form of the word, from which variations could be derived if required.
Marathi Text Analysis using Unsupervised Learning and Word Cloud
- Computer ScienceInternational Journal of Engineering and Advanced Technology
- 2020
Results prove the robustness of the proposed approach for Marathi Corpus, an application of TF-IDF, cosine-based document similarity measures and cluster dendrograms, in addition to various other Natural Language Processing (NLP) activities.
On Exhaustive Evaluation of Eager Machine Learning Algorithms for Classification of Hindi Verses
- Computer Science
- 2020
Text classification algorithms along with Natural Language Processing (NLP) facilitates fast, cost-effective, and scalable solution for classification and prediction of verses on Hindi corpus.
Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe
- Computer ScienceCoNLL Shared Task
- 2017
We present an update to UDPipe 1.0 (Straka et al., 2016), a trainable pipeline which performs sentence segmentation,
tokenization, POS tagging, lemmatization and dependency parsing.
We provide…
Kāvi: An Annotated Corpus of Punjabi Poetry with Emotion Detection Based on ‘Navrasa’
- Computer Science
- 2020
Predicting Sensitivity of Local News Articles from Odia Dailies
- Computer Science
- 2019
Positive, negative and neutral local news is categorized and prediction of sensitivity from negative local news articles is predicted to set priority of action to be taken by the local administration.
Document clustering: TF-IDF approach
- Computer Science2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)
- 2016
Term Frequency-Inverse Document Frequency algorithm is used along with fuzzy K-means and hierarchical algorithm along with different clusters of the related documents the resulted silhouette coefficient, entropy and F-measure trend are presented to show algorithm behavior for each data set.
On Readability Metrics of Goal Statements of Universities and Brand-Promoting Lexicons for Industries
- Computer Science
- 2020
The correlation between the found lexicons and the revenues generated by the considered companies is advocated and Pearson's correlation coefficient and Flesch Readability Index are deployed for the calculation of various metrics to form the basis of the conclusions.