A framework for authorship identification of online messages: Writing-style features and classification techniques
@article{Zheng2006AFF, title={A framework for authorship identification of online messages: Writing-style features and classification techniques}, author={Rong Zheng and Jiexun Li and Hsinchun Chen and Zan Huang}, journal={J. Assoc. Inf. Sci. Technol.}, year={2006}, volume={57}, pages={378-393} }
With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific…
Figures and Tables from this paper
370 Citations
An improved framework for authorship identification in online messages
- Computer ScienceCluster Computing
- 2017
For this work, the C4.5, the fuzzy and also the Ada boost classifiers will be used for the task of authorship-identification and the effects of these classification techniques on online messages is evaluated.
Applying authorship analysis to extremist-group Web forum messages
- Computer ScienceIEEE Intelligent Systems
- 2005
A special multilingual model is developed - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the language's unique characteristics and incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages.
Better Features Sets for Authorship Attribution of Short Messages
- Computer Science
- 2017
This research will study how to authenticate a user by the writing style in a short text posted on Twitter, and the effects of different feature sets and sample sizes are evaluated in the research.
Towards an Information Theoretic Model for Online Message Authorship Identification
- Computer Science
- 2014
The results show that the proposed model can be used effectively for monitoring and identifying authorship of such documents as emails, chat conversations, web logs, forum posts, and more so for closed sets of users such as a research facility, an enterprise, or an organization.
Design and Implementation of a Machine Learning-Based Authorship Identification Model
- Computer ScienceSci. Program.
- 2019
The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language.
Authorship classification: a syntactic tree mining approach
- Computer ScienceUP '10
- 2010
A novel approach to mining discriminative k-embedded-edge subtree patterns from a given set of syntactic trees that reduces the computational burden of using complex syntactic structures as a feature set is proposed and is shown to increase the classification accuracy.
A novel approach of mining write-prints for authorship attribution in e-mail forensics
- Computer ScienceDigit. Investig.
- 2008
A Machine Learning Framework for Authorship Identification From Texts
- Computer Science
- 2019
An approach and a model are presented which learns the differences in writing style between 50 different authors and is able to predict the author of a new text with high accuracy and the accuracy is seen to increase significantly after introducing certain linguistic stylometric features along with text features.
A Machine Learning Framework for Authorship Identification From Texts
- Computer ScienceArXiv
- 2019
This work presents an approach and a model which learns the differences in writing style between $50$ different authors and is able to predict the author of a new text with high accuracy and is seen to increase significantly after introducing certain linguistic stylometric features along with text features.
A unified data mining solution for authorship analysis in anonymous textual communications
- Computer ScienceInf. Sci.
- 2013
References
SHOWING 1-10 OF 63 REFERENCES
Style mining of electronic messages for multiple authorship discrimination: first results
- Computer ScienceKDD '03
- 2003
The results show that stylistic models can be accurately learned to determine an author's identity, based only on the message text.
Authorship Analysis in Cybercrime Investigation
- Computer ScienceISI
- 2003
The results indicate that the proposed approach to adopt the authorship analysis framework can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.
Authorship Attribution with Support Vector Machines
- Computer ScienceApplied Intelligence
- 2004
The support vector machine (SVM) is applied to the use of text-mining methods for the identification of the author of a text, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of atext.
Computer-Based Authorship Attribution Without Lexical Measures
- Computer ScienceComput. Humanit.
- 2001
This paper presents a fully-automated approach to the identification of the authorship of unrestricted text that excludes any lexical measure and adapts aset of style markers to the analysis of the text performed by an already existing natural language processing tool using three stylometric levels.
Mining e-mail content for author identification forensics
- Computer ScienceSGMD
- 2001
An investigation into e-mail content mining for author identification, or authorship attribution, for the purpose of forensic investigation found promising results for both aggregated and multi-topic author categorisation.
An experiment in authorship attribution
- Psychology
- 2002
The results of an experiment in authorship attribution are interpreted as supporting the hypothesis that authors have ’ textual fingerprints’, at least for texts produced by authors who are not consciously changing their style of writing across texts.
Gender-preferential text mining of e-mail discourse
- Computer Science18th Annual Computer Security Applications Conference, 2002. Proceedings.
- 2002
An extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a support vector machine learning algorithm gave promising results for author gender categorisation.
Mining E-mail Authorship
- Computer Science
- 2000
An investigation into the learning of authorship identication or categorisation for the case of e-mail documents using the Support Vector Machine as the learning method is reported.
Feature-Finding for Text Classification
- Linguistics
- 1996
Results of a benchmark test on ten representative text-classification problems suggest that the technique here designated Monte-Carlo Feature-Finding has certain advantages that deserve consideration by future workers in this area.
Automatically Categorizing Written Texts by Author Gender
- Computer ScienceLit. Linguistic Comput.
- 2002
It is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy.