Corpus ID: 10664722

Author Masking through Translation

@inproceedings{Keswani2016AuthorMT,
  title={Author Masking through Translation},
  author={Yashwant Keswani and H. Trivedi and Parth Mehta and Prasenjit Majumder},
  booktitle={CLEF},
  year={2016}
}
This notebook paper documents the approach adopted by our team for Author Masking Task in PAN 2016. For the purpose of masking the identity of the author, we use a simple translation based approach. From the source language (English), the text is translated to an intermediate language before it gets finally translated back to English. In this process, depending on the translation model and various penalties used during the translation process, a change of the structure of the language seeps in… Expand
Author Masking by Sentence Transformation
TLDR
This work proposes a method that performs transformations in sentences, with an unsupervised approach, i.e., without previous data of the author or linguistic characteristics of a document collection. Expand
SU@PAN'2016: Author Obfuscation
TLDR
This paper presents the approach for hiding an author’s identity by masking their style, which was developed for the Author Obfuscation task, part of the PAN-2016 competition. Expand
Evaluating Safety, Soundness and Sensibleness of Obfuscation Systems
TLDR
This work describes the methodology to evaluate the submitted obfuscation systems based on their safety, soundness and sensibleness, and introduces automatic evaluation measures for the first two dimensions. Expand
The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation - (Best of the Labs Track at CLEF-2017)
TLDR
An approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition. Expand
Author Obfuscation on Indonesian News Articles Using Genetic Algorithms
TLDR
A genetic algorithm-based author obfuscation model was created to modify Indonesian news articles to avoid identification from authorship attribution while keeping its semantics. Expand
Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited
TLDR
There is still way to go to “perfect” automatic obfuscation that (1) tricks verification approaches, (2) keeps the meaning of the original, and (3) is, regarding its obfuscation, unsuspicious to a human eye. Expand
A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation
TLDR
This paper proposes an automatic method, called Adversarial Author Attribute Anonymity Neural Translation (A4NT), that is effective in fooling the author attribute classifiers and thus improves the anonymity of authors. Expand
Attribute Anonymity by Adversarial Training of Neural Machine Translation
Text-based analysis methods enable an adversary to reveal privacy relevant author attributes such as gender, age and can identify the text’s author. Such methods can compromise the privacy of anExpand
$A^{4}NT$: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation
TLDR
This paper combines sequence-to-sequence language models used in machine translation and generative adversarial networks to obfuscate author attributes and proposes and evaluates techniques to impose constraints on the authors' method to preserve the semantics of the input text. Expand
A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X
TLDR
A genetic algorithm based random search framework called Mutant-X which can automatically obfuscate text to successfully evade attribution while keeping the semantics of the obfuscated text similar to the original text much better than existing automated authorship obfuscation approaches. Expand
...
1
2
...

References

SHOWING 1-10 OF 16 REFERENCES
obfuscation using WordNet and language models Notebook for PAN at CLEF 2016
As almost all the successful author identification approaches are based on the word frequencies, the most obvious way to obfuscate a text is to distort those frequencies. In this paper we chose aExpand
Author Obfuscation using WordNet and Language Models
TLDR
This paper chose a subset of the most frequent words for an author and replace each one with one of their synonyms, and considered two measures: similarity of the original word and the synonym, and the difference between the scores that are assigned to the original and distorted sentences by a language model. Expand
SU@PAN'2016: Author Obfuscation
TLDR
This paper presents the approach for hiding an author’s identity by masking their style, which was developed for the Author Obfuscation task, part of the PAN-2016 competition. Expand
Overview of the Author Identification Task at PAN 2013
TLDR
The author identification task at PAN-2014 focuses on author verification and adopts the c@1 measure, originally proposed for the question answering task, and continues the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Expand
Overview of the PAN/CLEF 2015 Evaluation Lab
TLDR
An overview of the PAN/CLEF evaluation lab is presented, in addition to usual author demographics, five personality traits are introduced openness, conscientiousness, extraversion, agreeableness, and neuroticism and a new corpus of Twitter messages covering four languages was developed. Expand
Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering
TLDR
This edition of PAN focuses on style breach detection and author clustering, two unsupervised authorship analysis tasks, and provides both benchmark data and an evaluation framework to compare different approaches. Expand
Obfuscating Document Stylometry to Preserve Author Anonymity
TLDR
This paper explores techniques for reducing the effectiveness of standard authorship attribution techniques so that an author A can preserve anonymity for a particular document D and introduces two levels of anonymization: shallow and deep. Expand
Moses: Open Source Toolkit for Statistical Machine Translation
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c)Expand
Empirical evaluation of authorship obfuscation using JGAAP
TLDR
This work uses a newly published corpus (the Brennan-Greenstadt Obfuscation corpus) and the JGAAP system to test different methods of authorship attribution against essays written in deliberate attempt to mask style. Expand
Europarl: A Parallel Corpus for Statistical Machine Translation
TLDR
A corpus of parallel text in 11 languages from the proceedings of the European Parliament is collected and its acquisition and application as training data for statistical machine translation (SMT) is focused on. Expand
...
1
2
...