Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language using Machine Learning

@inproceedings{Saini2016AnujDPILFIRE2016AN,
  title={Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language using Machine Learning},
  author={Anuj Saini},
  booktitle={FIRE},
  year={2016}
}
  • Anuj Saini
  • Published in FIRE 7 December 2016
  • Computer Science
Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrases for the given Shared Task DPIL (Detecting Paraphrases in Indian Languages) in Hindi Language… 

Sentence Paraphrase Detection Using Classification Models

TLDR
A supervised learning strategy for paraphrase detection is described whereby the two sentences are classified to decide the paraphrase relationship and using only the lexical features operated at n-gram as the classification features.

Detecting Paraphrases in Marathi Language

  • S. SrivastavaS. Govilkar
  • Linguistics, Computer Science
    BOHR International Journal of Smart Computing and Information Technology
  • 2020
TLDR
The total para phrases core was calculated after joining statistical and semantic similarity scores which gives the judgement of being paraphrase or non-paraphrase about the Marathi sentences.

Creating Paraphrase Identification Corpus for Indian Languages

TLDR
This chapter explains the creation of paraphrase corpus for Hindi, Tamil, Malayalam, and Punjabi languages, which is the first publicly available corpus for any Indian language.

Paraphrase Identification of Marathi Sentences

  • S. SrivastavaS. Govilkar
  • Computer Science
    International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018
  • 2018
TLDR
Paraphrasing has its important contribution to various NLP tasks like Plagiarism Detection, Text summarization, Question Answering, information Retrieval, Text Simplification and paraphrase detection on SMS.

Sentence similarity detection in Malayalam language using cosine similarity

  • P. GokulB. AkhilKumar K. M Shiva
  • Computer Science
    2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT)
  • 2017
TLDR
This paper used test data of 900 and 1400 sentence pairs of FIRE 2016 Malayalam corpus that used in two iterations to present and obtained an accuracy of 0.8 and 0.59.

Academic Plagiarism Detection: A Systematic Literature Review

TLDR
The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.

Academic Plagiarism Detection

TLDR
The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.

References

SHOWING 1-10 OF 13 REFERENCES

Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview

TLDR
The overview of the shared task on “Detecting Paraphrases in Indian Languages” (DPIL) conducted at FIRE 2016 is explained, which is the first open-source paraphrase detection corpora for Indian languages.

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

TLDR
This work introduces a method for paraphrase detection based on recursive autoencoders (RAE) and unsupervised RAEs based on a novel unfolding objective and learns feature vectors for phrases in syntactic trees to measure word- and phrase-wise similarity between two sentences.

A Novel Approach to Paraphrase Hindi Sentences using Natural Language Processing

TLDR
This application can be helpful in designing robots to understand different forms of Hindi sentences, to use as Hindi tutor for students to get them idea about different form of sentences and in plagiarism tools to find the higherlevel plagiarized text up to certain extent.

SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)

TLDR
In this shared task, evaluations on two related tasks Paraphrase Identification and Semantic Textual Similarity (SS) systems for the Twitter data are presented and the importance to bringing these two research areas together is suggested.

AMRITA_CEN@SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders

TLDR
This paper explores using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter using phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model.

A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing

TLDR
This work identifies the relation between all overlapping word pairs with the help of controlled closeness centrality and semantic similarity and uses the plagiarized word patterns in the identification of plagiarized texts from the target document.

A comparative study of ensemble learning methods for classification in bioinformatics

  • Aayushi VermaS. Mehta
  • Computer Science
    2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence
  • 2017
TLDR
It is observed empirically that the proposed ensemble learning approach “BBS method” gives better accuracy with lower root mean square error rate using the technique of ensemble learning.

Reflexive hybrid approach to provide precise answer of user desired frequently asked question

  • Aayushi VermaAnuja Arora
  • Computer Science
    2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence
  • 2017
TLDR
The proposed approach extracts lexical, structural and semantic behavior of the user query and provides a hybrid approach which is basically combination of term frequency-Inverse document frequency with POS tagging and Word2Vec to retrieve most similar answer corresponding to entered user query.

A Comparative Study of Ensemble Learning Approaches in the Classification of Breast Cancer Metastasis

TLDR
It is inferred that the ensemble learn-ing approaches with subnetwork markers might be more suit-able in handling the classification problem of breast cancer metastasis, and the use of these approaches in similar classification problems is recommended.

A Comparison of Decision Tree Ensemble Creation Techniques

TLDR
An algorithm is introduced that decides when a sufficient number of classifiers has been created for an ensemble, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble.