N-Gram Feature Selection for Authorship Identification

@inproceedings{Houvardas2006NGramFS,
  title={N-Gram Feature Selection for Authorship Identification},
  author={John Houvardas and E. Stamatatos},
  booktitle={AIMSA},
  year={2006}
}
  • John Houvardas, E. Stamatatos
  • Published in AIMSA 2006
  • Computer Science
  • Automatic authorship identification offers a valuable tool for supporting crime investigation and security. It can be seen as a multi-class, single-label text categorization task. Character n-grams are a very successful approach to represent text for stylistic purposes since they are able to capture nuances in lexical, syntactical, and structural level. So far, character n-grams of fixed length have been used for authorship identification. In this paper, we propose a variable-length n-gram… CONTINUE READING
    186 Citations
    A Machine Learning Framework for Authorship Identification From Texts
    • 5
    • PDF
    A Machine Learning Framework for Authorship Identification From Texts
    • Aaron Pressman, A. Crosby, +44 authors Therese Poletti
    • 2019
    Authorship Identification of E-mail as a Multi-Class Task - Notebook for PAN at CLEF 2011
    • 4
    • PDF
    BLN-Gram-TF-ITF as a new Feature for Authorship Identification
    • 4
    Authorship identification from unstructured texts
    • 43
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 31 REFERENCES
    Automatic Text Categorization in Terms of Genre and Author
    • 438
    • PDF
    N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION
    • 441
    Linguistic Profiling for Authorship Recognition and Verification
    • 91
    • PDF
    Author Identification on the Large Scale
    • 101
    • PDF
    Language independent authorship attribution using character level language models
    • 117
    A repetition based measure for verification of text collections and for text categorization
    • 97
    • PDF
    A comparison of event models for naive bayes text classification
    • 3,585
    • PDF
    Mining e-mail content for author identification forensics
    • 556
    • PDF
    Applying authorship analysis to extremist-group Web forum messages
    • 393
    • PDF
    A Comparative Study on Feature Selection in Text Categorization
    • 5,361
    • PDF