Empirical evaluations of language-based author identification techniques

  title={Empirical evaluations of language-based author identification techniques},
  author={Carole E. Chaski},
  journal={International Journal of Speech Language and The Law},
  • C. Chaski
  • Published 1 June 2001
  • Computer Science
  • International Journal of Speech Language and The Law
Recent Court decisions in the United States call for the empirical testing of language-based author identification techniques. This article shows the results of such testing. The tested hypotheses include: syntactic analysis, syntactically-classified punctuation, sentential complexity, vocabulary richness, readability, content analysis, spelling errors, punctuation errors, word form errors, and grammatical errors. These hypotheses are tested on a set of documents written by four women who are… Expand
English Text Classification by Authorship and Date
A Markov chain of every United States Supreme Court opinion ever written was produced and its ability to classify American judicial opinions by decade of authorship was evaluated, and two sets of quasi-linguistic feature sets were examined. Expand
Testing the Reliability of an Authorship Identification Method
The aim is to finally reach an acceptance of existing authorship identification methods in courtroom, which would mean that even written language can be named an evidence, a ’fingerprint’ in court, satisfying the Daubert Test for expert testimony. Expand
Identification of a Writer ’ s Native Language by Error Analysis
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation does notExpand
The creation of Base Rate Knowledge of linguistic variables and the implementation of likelihood ratios to authorship attribution in forensic text comparison
This research implements advanced statistical techniques within the field of forensic text comparison that improve the reliability of linguistic evidence furnished in Court and offers probabilistic results to assist not only the judge and jury but also the linguistic expert in order to carry out more rigorous testing and extensive performance analysis of the data. Expand
Strength of linguistic text evidence: A fused forensic text comparison system.
  • S. Ishihara
  • Computer Science, Medicine
  • Forensic science international
  • 2017
It is demonstrated in this study that out of the three procedures, the MVKD procedure with authorship attribution features performed best in terms of Cllr, and that the fused system outperformed all three of the single procedures. Expand
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach
It is argued that textbites, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying material are able to identify authors by reducing a mass of data to key segments that move us closer to the elusive concept of idiolect. Expand
Linguistic identifiers of L1 Persian speakers writing in English:NLID for authorship analysis
This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies;Expand
A scalable framework for cross-lingual authorship identification
A cross-lingual authorship identification solution that can accurately handle a large number of authors is proposed and outperforms the best existing solution that does not rely on external resources. Expand
An Empirical Study on Forensic Analysis of Urdu Text Using LDA-Based Authorship Attribution
The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing styles of authors, and the inherent ambiguity of Urdu language text. Expand
Developing and Analyzing a Spanish Corpus for Forensic Purposes
Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features, and the natural language processing tool ALIAS TATTLER is being developed for Spanish. Expand


Statistics in Language Studies
This book demonstrates the contribution that statistics can and should make to linguistic studies. The range of work to which statistical analysis is applicable is vast: including, for example,Expand
“Who Was 'Shadow'?” the Computer Knows: Applying Grammar-Program Statistics in Content Analyses to Solve Mysteries About Authorship
This study's objective was to employ the statistics-documentation portion of a word-processing program's grammar-check feature as a final, definitive, and objective tool for content analyses - usedExpand
Statistics as Language.
I told Professor Smith that his approach to statistics differed greatly from yours. He puts a formula on a board, tells you to get it into your notes, memorize it, and then expects you to know it forExpand
The Most Common Mistakes in English Usage
美国宾夕法尼亚州西切斯特州立学院(West Chester State College)英语教授Tho-mas Elliott Berry所著的The Most CommonMistakes in English Usage本是六十年代初
Elegy by W.S.: A Study in Attribution
This study investigates the authorship of A Funerall Elegye, composed by an unidentified "W. S." in memory of William Peter, an Oxford scholar murdered on 25 January 1612. Is it a lost poem ofExpand
Nonparametric Statistics for the Behavioral Sciences.
This is the revision of the classic text in the field, adding two new chapters and thoroughly updating all others. The original structure is retained, and the book continues to serve as a combinedExpand
Linguistic methods of determining authorship
  • National Institute of Justice Research Seminar. 49th American Academy of Forensic Sciences Meeting,
  • 1996
Forensic Stylistics, Amsterdam: Elsevier
  • 1993
Linguistic Authentication and Reliability
  • National Conference on Science and Law Proceedings,
  • 2000