• Corpus ID: 2331765

Style Breach Detection with Neural Sentence Embeddings

  title={Style Breach Detection with Neural Sentence Embeddings},
  author={Kamil Safin and Rita Kuznetsova},
  booktitle={Conference and Labs of the Evaluation Forum},
The paper investigates method for the style breach detection task. We developed a method based on mapping sentences into high dimensional vector space. Each sentence vector depends on the previous and next sentence vectors. As main architecture for this mapping we use the pre-trained encoder-decoder model. Then we use these vectors for constructing an author style function and detecting outliers. Method was tested on the PAN-2017 collection for the style breach detection task. 

Figures and Tables from this paper

Three Style Similarity: sentence-embedding, auxiliary words, punctuation

This paper illustrates the strategy followed by the UO-UDC team at the Style Change Detection Shared Task at PAN22 with a non-supervised approach to Task 1 and tested two similarity decisions for Task 2 and Task 3 based on semantic, punctuation marks, and auxiliary word similarity.

A Model for Style Breach Detection at a Glance: Notebook for PAN at CLEF 2018

This year’s PAN Author Identification sub-task for style change detection deals with a single question, whether or not a document has multiple authors, and a simple straightforward and fast approach is proposed in this document.

Rich Style Embedding for Intrinsic Plagiarism Detection

A new style embedding is proposed that combines syntactic trees and the pre-trained Multi-Task Deep Neural Network (MT-DNN) and uses attention mechanisms to sum the embeddings, thereby experimenting with both a Bidirectional Long Short-Term Memory (BiLSTM) and a Convolutional Neural Network maxpooling for sentences encoding.

Recursive Style Breach Detection with Multifaceted Ensemble Learning

We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where

Notebook for PAN at CLEF 2022

A within-document authorship clustering method based on ensemble learning to tackle style change detection in multi-authored documents is proposed and is an unsupervised learning method that does not require training or parameter tuning.

Style Change Detection with Feed-forward Neural Networks

A system consisting of two modules, one for distinguishing the single-author documents from the multiauthor documents and the other for determining the exact number of authors in the multi- author documents is presented.

Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering

This edition of PAN focuses on style breach detection and author clustering, two unsupervised authorship analysis tasks, and provides both benchmark data and an evaluation framework to compare different approaches.

Overview of the Style Change Detection Task at PAN 2019

The style change detection task, the underlying dataset, a survey of the participants’ approaches, as well as the results are presented in this paper.

Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection

This edition of PAN studies two task, the novel task of cross-domain authorship attribution, where the texts of known and unknown authorship belong to different domains, and style change detection, where single-author and multi-author texts are to be distinguished.

Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis

This paper uses two task setups—style change detection and style breach detection—from PAN’s series of conferences on text forensics and stylometry to evaluate the effectiveness of hierarchical clustering and other relevant models presented at PAN conferences.



Methods for Intrinsic Plagiarism Detection and Author Diarization

A plagiarism detection method based on constructing an author style function from features of text sentences and detecting outliers and adapted the method for the diarization problem by segmenting author style statistics on text parts, which correspond to different authors.

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the

Intrinsic Plagiarism Detection Using Character n-gram Profiles

A new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity measure originally proposed for author identification.

Intrinsic Plagiarism Detection using N-gram Classes

A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods.

External and Intrinsic Plagiarism Detection Using Vector Space Models

This work presents a conceptually simple space partitioning approach to achieve search times sub linear in the number of ref- erence documents, trading precision for speed.

An Evaluation Framework for Plagiarism Detection

Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Overview of the 6th International Competition on Plagiarism Detection

Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length.

A Critique and Improvement of an Evaluation Metric for Text Segmentation

A simple modification to the Pk metric is proposed, called Window Diff, which moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of borders for that window of text.

Clustering by Authorship Within and Across Documents

An overview of the shared tasks on author clustering and author diarization at PAN 2016 is presented including evaluation datasets, measures, results, as well as a survey of a total of 10 submissions.

Overview of PAN'17 - Author Identification, Author Profiling, and Author Obfuscation

A high-level overview of each of the three shared tasks organized this year, namely author identification, author profiling, and author obfuscation, gives a brief summary of the evaluation data, performance measures, and results obtained.